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TITLE OF THE INVENTION 
REFERENCE CLONES AND SEQUENCES FOR 
NON-SUBTYPE B ISOLATES OF HUMAN 
IMMUNODEFICIENCY VIRUS TYPE 1 
This work was funded by grants ROl AI25291; and NOl AI35170 from the National 
Institutes of Health. Therefore, the government may have certain rights in the 
invention. 

FIELD OF THE INVENTION 
The present invention is in the field of virology. The invention relates 
to the nucleotide sequences of the genomes of 1 1 molecular clones for non-subtype B 
isolates of human immunodeficiency virus type 1 (HIV-1), and nucleic acids derived 
therefrom. This invention also relates to peptides encoded by and/or derived from the 
nucleic acid sequences of these molecular clones, and host cells containing these 
nucleic acid sequences and peptides. The invention also relates to diagnostic methods, 
kits and immunogens which employ the nucleic acids, peptides and/or host cells of the 
invention. 

BACKGROUND OF THE INVENTION 
A critical question facing current AIDS vaccine development efforts is 
to what extent HIV-1 genetic variation has to be considered in the design of candidate 
vaccines (11,21,42,72). Phylogenetic analyses of globally circulating viral strains 
have identified two distinct groups of HIV-1, a major M group and an O group 
(33,45,61,62). Within the M group, ten sequence subtypes (A-J) have been proposed 
(29,30,45,72). Sequence variation among viruses belonging to these different lineages 
is extensive, with envelope amino acid sequence variation ranging from 24% between 
different subtypes to 47% between the two different groups. Given this extent of 
diversity, the question has been raised whether immunogens based on a single virus 
strain can be expected to elicit immune responses effective against a broad spectrum of 
viruses, or whether vaccine preparations should include mixtures of genetically 
divergent antigens and/or be tailored toward locally circulating strains (1 1, 21, 42, 72). 
This is of particular concern in developing countries where multiple subtypes of HIV- 
1 are known to co-circulate and where subtype B viruses, which have been the source 
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for most current candidate vaccine preparations (10, 21), are rare or nonexistent (5, 24, 
40, 72). 

Although the extent of global HIV-1 variation is well defined, little is 
known about the biological consequences of this genetic diversity and its impact on 
cellular and humoral immune responses in the infected host. In particular, it remains 
unknown whether subtype specific differences in virus biology exist that need to be 
considered for vaccine design. Only a comprehensive analysis of genetically defined 
representatives of the various groups and subtypes will address the question of 
whether certain variants differ in fundamental viral properties and whether such 
differences will need to be incorporated into vaccine strategies. Obviously, such 
studies require well-characterized reference reagents, in particular full length and 
replication competent molecular clones that can be used for functional and biological 
studies. 

Full-length reference sequences representing the various subtypes are 
also urgently needed for phylogenetic comparisons. Until about 1994, it was generally 
thought that individuals do not become infected with multiple distinct HIV-1 strains, 
and so the possibility that recombination between divergent viruses could contribute to 
the evolution of HIV-1 was not widely considered. However, recent analyses of 
subgenomic (23,52,54,58) as well as full-length HIV-1 sequences (7,18,53,60) 
identified a surprising number of HIV-1 strains which clustered in different subtypes 
in different parts of their genome. All of these originated from geographic regions 
where multiple subtypes co-circulated and are the results of co-infections with highly 
divergent viruses (52,60,62). 

Recombinant viruses can be detected because their phylogenetic 
affinities vary depending on the region of genome analyzed. A useful initial approach 
is to examine the extent of sequence divergence/similarity between a new sequence 
and a bank of reference sequences of different subtypes, for example as a diversity plot 
(18), or using the RIP program (75); if the extent of relative similarity to different 
subtypes varies along the sequence, this may indicate that the sequence is a 
recombinant. However, fuller investigation must involve a phylogenetic approach, 
comparing trees derived by analyses of different regions of the genome, and assessing 
the confidence of phylogenetic clustering by a statistical approach such as the 
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bootstrap. A thorough analysis would involve taking a window of sequence of a 
certain size, and moving this window along the genome in steps of a defined size, 
generating perhaps hundreds of trees for visual examination in the process. There are 
at least two short cuts. One is to analyze only a few windows, defining selected 
regions according to the output of the diversity analysis. Another is to not examine the 
entire phylogenetic tree of all subtypes, but to focus on one particular phylogenetic 
question. Thus, if the initial analyses suggest that a sequence may be a recombinant 
between two particular subtypes, it is possible to ask simply what is the bootstrap 
value for the clustering of the new sequence with one or another particular subtype, 
and plot these values as a function of position along the genome; this is the basis of the 
"bootscanning" approach (57). Once the subtypes putatively involved in the 
recombination event have been identified, and the crossover points have been 
approximately localized, more precisely defined breakpoints can be determined, and 
their statistical significance assessed, using informative site analysis (19, 52, 53). 

Detailed phylogenetic characterization revealed that most of the 
recombinant viruses have a complex genome structure with multiple points of 
crossover (7,18,53,60). Some recombinants, like the "subtype E" viruses, which are in 
fact A/E recombinants (7,18), have a wide-spread geographic dissemination and are 
responsible for much of the Asian HIV-1 epidemic (69,70). In other areas, 
recombinants appear to be generated with increasing frequencies as many randomly 
chosen isolates exhibit evidence of mosaicism (4,8,31,66,71). 

Since recombination provides the opportunity for evolutionary leaps 
with genetic consequences that are far greater than the steady accumulation of 
individual mutations, the impact of recombination on viral properties must be 
monitored. Full-length non-recombinant reference sequences for all major HIV-1 
groups and subtypes are thus needed to map and characterize the extent of inter- 
subtype recombination. 

Non-subtype B viruses cause the vast majority of new HIV-1 infections 
worldwide. Although their geographic dissemination is carefully monitored, their 
immunogenic and biological properties remain largely unknown, in part because well- 
characterized virological reference reagents are lacking. In particular, full length 
clones and sequences are rare, since subtype classification is frequently based on small 
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PCR-derived viral fragments. There are currently only five full length, non- 
recombinant molecular clones available for viruses other than subtype B (45), and 
these represent only three of the proposed (group M) subtypes (A, C and D). 
Moreover, only three clones (all derived from subtype D viruses) are replication 
competent and thus useful for studies requiring functional gene products (45,48,65). 
Given the unknown impact of genetic variation on correlates of immune protection, 
subtype specific reagents are critically needed for phylogenetic, immunological and 
biological studies. 

SUMMARY OF INVENTION 
The present invention pertains to the isolation and characterization of 
the genomic sequences of 1 1 molecular clones for non-subtype B HIV-1 isolates of 
human immunodeficiency virus type 1 (HIV-1), and nucleic acids derived therefrom. 
Of these 11 molecular clones, 94IN476.104, 96ZM651.8, and 96ZM751.3 are non- 
mosaic reference clones of HIV-1 subtype C; 93BR020.1 is a reference clone of HIV- 
1 subtype F; 90CF056.1 is a reference clone of HIV-1 subtype H; 92RW009.6 is a 
double recombinant of HIV-1 subtypes A/C; 92NG083.2 and 92NG003.1 are double 
recombinants of HIV-1 subtypes A/G; 93BR029.4 is a double recombinant of HIV-1 
subtypes B/F; 94CY017.41 is a double recombinant of HIV-1 subtype A and a new, as 
yet undefined, subtype; and 94CY032.3 is a triple recombinant of HIV-1 subtypes 
A/G/I. 

In particular, the present invention relates to nucleic acids comprising 
the genomic sequences of one or more of these 1 1 clones for non- subtype B HIV-1 
isolates, as well as nucleic acids comprising the complementary (or antisense) 
sequence of one or more of the genomic sequences of these 1 1 clones, and nucleic 
acids derived therefrom. 

The invention also relates to vectors comprising the nucleic acid 
genomic sequence of one or more of these 1 1 clones, as well as nucleic acids 
comprising the complementary (or antisense) sequence of one or more of the genomic 
sequences of these clones, and nucleic acids derived therefrom. 

The invention also relates to cultured host cells comprising the nucleic 
acid genomic sequences of one or more of these 1 1 clones for non-subtype B HIV-1 
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isolates, as well as nucleic acids comprising the complementary (or antisense) 
sequence of one or more of the genomic sequences of these clones, and nucleic acids 
derived therefrom. 

The invention also relates to host cells containing vectors comprising 
the genomic sequences of one or more of these 1 1 clones for non-subtype B HIV-1 
isolates, as well as nucleic acids comprising the complementary (or antisense) 
sequence of one or more of the genomic sequences of these clones, and nucleic acids 
derived therefrom. 

The invention also relates to synthetic or recombinant polypeptides 
encoded by or derived from the nucleic acid sequences of one or more of the genomes 
of these 1 1 clones for non-subtype B HIV-1 isolates, and fragments thereof. 

The invention also relates to methods for producing the polypeptides of 
the invention in culture using one or more of these 1 1 clones for non-subtype B HIV-1 
viruses or nucleic acids derived therefrom, including recombinant methods for 
producing the polypeptides of the invention. 

The invention further relates to methods of using the polypeptides of 
the invention as immunogens to stimulate an immune response in a mammal, such as 
the production of antibodies, or the generation of cytotoxic or helper T-lymphocytes. 

The invention also relates to methods of using the polypeptides of the 
invention to detect antibodies which immunologically react with non-subtype B HIV-1 
viruses in a mammal or in a biological sample. 

The invention also relates to kits for the detection of antibodies specific 
for non-subtype B HIV-1 viruses in a biological sample where said kit contains at least 
one polypeptide encoded by or derived from the nucleic acid sequences of the 
invention. 

The invention also relates to antibodies, which immunologically react 
with the virions of one or more of these 1 1 viruses and/or their encoded polypeptides. 

The invention also relates to methods of detecting virions of non- 
subtype B HIV-1 viruses and/or their encoded polypeptides, or fragments thereof, 
using antibodies of the invention. 

The invention also relates to kits for detecting the virions of non- 
subtype B HIV-1 viruses and/or their encoded polypeptides, wherein the kit comprises 
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at least one antibody of the invention. 

The invention also relates to a method for detecting the presence of 
non-subtype B HIV-1 viruses in a mammal or a biological sample, said method 
comprising analyzing the DNA or RNA of a mammal or a sample for the presence of 
the RNAs, cDNAs or genomic DNAs which will hybridize to a nucleic acid derived 
from one or more of these 1 1 non-subtype B HIV-1 molecular clones. Usually, when 
a completely complementary probe is used, high stringency conditions are desirable in 
order to prevent false positives. However, conditions of high stringency should only 
be used if the probes are complementary to target regions which lack heterogeneity. 
The stringency of hybridization is determined by a number of factors during 
hybridization and during the washing procedure, including temperature, ionic strength, 
length of time, and concentration of formamide, if any. The nucleic acid sequences 
used in probes should be unique to HIV, i.e., the nucleic acid sequences should be 
absent from individuals not infected with HIV. 

The invention also provides diagnostic kits for the detection of non- 
subtype B HIV-1 viruses in a mammal using the nucleic acids of the invention. In one 
embodiment, the kit comprises nucleic acids having sequences useful as hybridization 
probes in determining the presence or absence of the RNAs, cDNAs or genomic 
DNAs of non-subtype B HIV-1 viruses. In another embodiment, the kit comprises 
nucleic acids having sequences useful as primers for reverse-transcription polymerase 
chain reaction (RT-PCR) analysis of RNA for the presence of HIV-1 viruses in a 
biological sample. 

The invention further relates to isolated and substantially purified 
nucleic acids, polypeptides and/or antibodies of the invention. 

The invention further relates to compositions comprising one or more 
of the nucleic acids, polypeptides and/or antibodies of the invention. 

The invention also relates to computer-generated alignments of the 
nucleic acid sequences of the viral genomes clones of the 1 1 clones of this invention, 
as well as alignments of the encoded amino acid sequences. These sequence 
alignments serve to highlight regions of homology and non-homology between 
different sequences and hence, can be used in preparing diagnostic reagents as 
described herein. 
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BRIEF DESCRIPTION OF THE FIGURES 
Fig. 1. Phylogenetic relationships of the 11 viral genomes described in 
this patent application (highlighted) to representatives of all major HIV-1 (group M) 
subtypes in gag (A) and env (B) regions. Trees were constructed from full-length 
gag and env nucleotide sequences using the neighbor joining method (see text for 
details of methodology). Horizontal branch lengths are drawn to scale; vertical 
separation is for clarity only. Values at the nodes indicate the percent bootstraps in 
which the cluster to the right was supported (bootstrap values of 75% and higher are 
shown). Asterisks denote hybrid genomes as determined by additional analyses. 
Brackets at the right represent the major sequence subtypes of HIV-1 group M. Trees 
were rooted by using SIVcpzGAB as an outgroup. 

Fig. 2. Diversity plots comparing the sequence relationships of the 1 1 
viral genomes described in this patent application to each other and to reference 
sequences from the database. In each of panels A-J ? the sequence named above the 
plots is compared to the sequences listed at the right. U455, LAI, C2220, and NDK 
are published reference sequences for subtypes A, B, C and D, respectively. Distance 
values were calculated for a window of 500 bp moved in steps of 10 nucleotides. The 
x-axis indicates the nucleotide positions along the alignment (gaps were stripped and 
removed from the alignment). The positions of the start codons of the gag, pol, vif 
vpr, env y and nef genes are shown. The y-axis denotes the distance between the 
viruses compared (0.05 = 5% divergence). 

Fig. 3. Exploratory tree analysis. Neighbor joining trees were 
constructed for a 500 bp window moved in increments of 100 bp along the multiple 
genome alignment. Trees depicting discordant branching orders among four of the 1 1 
sequences included in this patent application are shown in panels A-I (hybrid 
sequences are boxed). The position of each tree in the alignment is indicated; subtypes 
are identified by brackets. Numbers at nodes indicate the percentage of bootstrap 
values with which the adjacent cluster is supported (only values above 80% are 
shown). Branch lengths are drawn to scale. 

Fig. 4. Recombination breakpoint analysis for 92RW009.6 and 
93BR029 A (A) Bootstrap plots depicting the relationship of 92RW009.6 to 
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representatives of subtype A and C, respectively. Trees were constructed from the 
multiple genome alignment and the magnitude of the bootstrap value supporting the 
clustering of 92RW009.6 with U455 and 92UG037.1 (subtype A), or C2220 and 
92BR025.8 (subtype C), respectively, was plotted for a window of 500 bp moved in 
increments of 10 bp along the alignment. Regions of subtype A or C origin are 
identified by very high bootstrap values (>90%). Points of cross-over of the two 
curves indicate recombination breakpoints. The beginning of gag, pol, vif y vpr, env 
and nef open reading frames are shown. The y-axis indicates the percent bootstrap 
replicates, which support the clustering of 92RW009.6 with representatives of the 
respective subtypes. (B) Bootstrap plots depicting the relationship of 93BR029.4 to 
representatives of subtype Band F, respectively. Analyses are as in (A), except that 
bootstrap values supporting the clustering of 93BR029.4 with SF2, OYI, MN, LAI and 
RF (subtype B), or 93BR020. 1 (subtype F), respectively, were plotted. Subtype D 
viruses were excluded from this analysis because of their known close relationship 
with subtype B viruses. 

Fig. 5. Recombination breakpoint analysis of 92NG083.2 and 
92NG003.L Neighbor joining trees depicting discordant branching orders of 
92NG003. 1 and 92NG083.2 in regions delineated by breakpoints identified by 
distance plots (not shown) are shown in panels A-D (hybrid sequences are boxed). 
The position of each tree in the alignment is indicated; subtypes are identified by 
brackets. Numbers at nodes indicate the percentage of bootstrap values with which 
the adjacent cluster is supported (only values above 80% are shown). Branch lengths 
are drawn to scale. 

Fig. 6. Inferred structure of the five recombinant genomes included in 
this patent application. LTR sequences were not analyzed and are thus shown as open 
boxes. 

Fig. 7. Subtype specific genome features. (A) Alignment of deduced 
Tat (region encoded by second exon) amino acid sequences. Consensus sequences 
were generated for available representatives of all major subtypes (question marks 
indicate sites at which fewer than 50% of the viruses contain the same amino acid 
residue). Dashes denote sequence identity with the consensus sequence, while dots 
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represent gaps introduced to optimize alignments. A vertical box highlights a 
premature Tat protein truncation (asterisk) which is present in 1 1 of 15 subtype D, and 
4 of 52 subtype B viruses (frequencies are listed in the column on the right). (B) 
Alignment of deduced Rev (region encoded by the second exon) protein sequences. 
(C) Alignment of deduced Vpu protein sequences. 

Fig. 8: Generation of replication competent proviral clones from long 
PCR products. The general construction scheme of a replication competent provirus 
from two separately amplified genomic regions is depicted. 

Fig. 9. Diversity plots comparing the sequence relationships of 
94CY032.3 to reference sequences from the database. 92UG037.1, LAI, C2220, and 
ELI are reference sequences for subtypes A, B, C and D, respectively. 92NG083.2 is a 
known G/A recombinant, but contains only a small subtype A fragment between 
position 4200 and 4800 (there is presently no full length non-mosaic subtype G 
reference sequence available). Distance values were calculated for a window of 400 
bp moved in steps of 10 nucleotides. The x-axis indicates the nucleotide positions 
along the alignment (gaps were stripped and removed from the alignment). The 
positions of the start codons of the gag, pol, vif, vpr, env, and nef genes are shown. 
The y-axis denotes the distance between the viruses compared (0.05 = 5% difference). 

Fig. 10. Exploratory tree analysis. Neighbor joining trees were 
constructed for a 400 bp window moved in increments of 10 bp along the multiple 
genome alignment. Trees in panel A-K depict the discordant branching orders for 
94CY032.3 (highlighted). The position of each tree in the alignment is indicated; 
subtypes are identified by brackets. Numbers at nodes indicate the percentage of 
bootstrap values with which the adjacent cluster is supported (only values above 80% 
are shown). Branch lengths are drawn to scale. 

Fig. 11. Bootstrap plot analysis to map recombination breakpoints in 
94CY032.3. Bootscanning was performed essentially as described, plotting the 
magnitude of the bootstrap value supporting the clustering of 94CY032.3 with 
92UG037.1 (subtype A)in comparison with that of 94CY032.3 and 92NG083.2 
("subtype G") for a window of 400 bp moved in increments of 10 bp along the 
alignment. Regions of subtype A or G origin are identified by very high bootstrap 
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values (>80%). The location of eight recombination crossovers is indicated. 
Breakpoint analysis between position 4200 and 4800 was not possible due to the 
recombinant nature of 92NG083.2. The beginning of gag pol, vif 9 vpr, env and nef 
open reading frames are shown. The y-axis indicates the percent bootstrap replicates, 
which support the clustering of 94CY032.3 with representatives of the respective 
subtypes. 

Fig. 12. Recombination breakpoint analysis of 94CY03 2. 3 in the 
vif/vpr region. Neighbor joining trees depicting the position of 94CY032.3 in regions 
flanking the breakpoints identified by distance plot analysis (not shown). Trees were 
constructed from the genomic regions indicated. Subtypes are identified by brackets. 
Four sequences from Mali represent subtype G (these are the only available subtype G 
reference sequences in this region, since all other "subtype G" viruses contain A 
fragments). Numbers at nodes indicate the percentage of bootstrap values with which 
the adjacent cluster is supported (only values above 80% are shown). Branch lengths 
are drawn to scale. 

Fig. 13. Nucleotide sequence alignment of the 1 1 near full-length HIV- 
1 sequences included in this patent application. Sequences were aligned using 
CLUSTAL W and adjusted manually using the sequence editor MASK Dots indicate 
gaps introduced to optimize the alignment. The beginning and end of all open reading 
frames are indicated by arrows above or below the alignment. The homologies 
between the sequences of nucleotides in the eleven independent clones are indicated 
by dashes. Sequences of nucleotides present uniquely in the various clones (as 
compared to the corresponding sequences of the other ten clones) are indicated by 
letters, i.e., the sequences themselves. 

Fig. 14. Amino acid sequence alignments of the Gag polypeptides 
encoded by the 1 1 near full-length fflV-1 sequences included in this patent 
application. The homologies between the sequences of amino acids in the various 
polypeptides encoded by the eleven independent clones are indicated by dashes. 
Sequences of amino acids present uniquely in the various polypeptides (as compared 
to the corresponding polypeptides of the other ten clones) are indicated by letters, i.e., 
the sequences themselves. 
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Fig. 15. Amino acid sequence alignments of the Pol polypeptides 
encoded by the 1 1 near full-length HIV-1 sequences included in this patent 
application. The homologies between the sequences of amino acids in the various 
polypeptides encoded by the eleven independent clones are indicated by dashes. 
Sequences of amino acids present uniquely in the various polypeptides (as compared 
to the corresponding polypeptides of the other ten clones) are indicated by letters, i.e., 
the sequences themselves. 

Fig. 16. Amino acid sequence alignments of the Vif polypeptides 
encoded by the 1 1 near full-length HIV-1 sequences included in this patent 
application. The homologies between the sequences of amino acids in the various 
polypeptides encoded by the eleven independent clones are indicated by dashes. 
Sequences of amino acids present uniquely in the various polypeptides (as compared 
to the corresponding polypeptides of the other ten clones) are indicated by letters, i.e., 
the sequences themselves. 

Fig. 17. Amino acid sequence alignments of the Vpr polypeptides 
encoded by the 1 1 near full-length HIV-1 sequences included in this patent 
application. The homologies between the sequences of amino acids in the various 
polypeptides encoded by the eleven independent clones are indicated by dashes. 
Sequences of amino acids present uniquely in the various polypeptides (as compared 
to the corresponding polypeptides of the other ten clones) are indicated by letters, i.e., 
the sequences themselves. 

Fig. 18. Amino acid sequence alignments of the Tat polypeptides 
encoded by the 1 1 near full-length HIV-1 sequences included in this patent 
application. The homologies between the sequences of amino acids in the various 
polypeptides encoded by the eleven independent clones are indicated by dashes. 
Sequences of amino acids present uniquely in the various polypeptides (as compared 
to the corresponding polypeptides of the other ten clones) are indicated by letters, i.e., 
the sequences themselves. 

Fig. 19. Amino acid sequence alignments of the Rev polypeptides 
encoded by the 1 1 near full-length HIV-1 sequences included in this patent 
application. The homologies between the sequences of amino acids in the various 
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polypeptides encoded by the eleven independent clones are indicated by dashes. 
Sequences of amino acids present uniquely in the various polypeptides (as compared 
to the corresponding polypeptides of the other ten clones) are indicated by letters, i.e., 
the sequences themselves. 

Fig. 20. Amino acid sequence alignments of the Vpu polypeptides 
encoded by the 1 1 near full-length fflV-1 sequences included in this patent 
application. The homologies between the sequences of amino acids in the various 
polypeptides encoded by the eleven independent clones are indicated by dashes. 
Sequences of amino acids present uniquely in the various polypeptides (as compared 
to the corresponding polypeptides of the other ten clones) are indicated by letters, i.e., 
the sequences themselves. 

Fig. 21. Amino acid sequence alignments of the Env polypeptides 
encoded by the 1 1 near full-length fflV-1 sequences included in this patent 
application. The homologies between the sequences of amino acids in the various 
polypeptides encoded by the eleven independent clones are indicated by dashes. 
Sequences of amino acids present uniquely in the various polypeptides (as compared 
to the corresponding polypeptides of the other ten clones) are indicated by letters, i.e., 
the sequences themselves. 

Fig. 22. Amino acid sequence alignments of the Nef polypeptides 
encoded by the 1 1 near full-length fflV-1 sequences included in this patent 
application. The homologies between the sequences of amino acids in the various 
polypeptides encoded by the eleven independent clones are indicated by dashes. 
Sequences of amino acids present uniquely in the various polypeptides (as compared 
to the corresponding polypeptides of the other ten clones) are indicated by letters, i.e., 
the sequences themselves. 

DETAILED DESCRIPTION OF THE INVENTION 
The present invention relates to the determination of the nucleic acid 
sequences of the complete or near complete genomes of 11 non-subtype B HIV-1 
viruses isolated from primary isolates collected at major epicenters of the global AIDS 
pandemic. The nucleotide sequences of these 1 1 viruses are shown in Fig. 13 (SEQ ID 
NOS: to ) 
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The phrase "derived from" is used throughout the specification and 
claims with respect to nucleic acids to describe nucleic acid sequences which 
correspond to a region of the designated nucleotide sequence. Preferably, the 
sequence of the region from which the nucleic acid is derived is, or is complementary 
to, a sequence which is unique to the genome of any one of the 1 1 clones of this 
invention. However, more preferably, the sequence of the region from which the 
nucleic acid is derived is, or is complementary to, a sequence which is unique to the 
viruses in the subtype corresponding to the subtype of any one of the 1 1 clones of this 
invention, and whose uniqueness was unknown prior to the disclosure of the clones of 
this invention. For example, sequences in the Cyprus clone 94CY032.3 which map to 
the I region are unique wherever they are not identical to known prior art sequences. 
Whether or not a sequence is unique to the genome of one of the molecular clones or a 
subtype can be determined by techniques known to those of skill in the art. For example, 
the sequence can be compared to sequences in databanks, e.g., GenBank, to determine 
whether it is present in the uninfected host or other organisms. The sequence can also be 
compared to the known sequences of other viral agents, including other retroviruses. The 
correspondence or non-correspondence of the derived sequence to other sequences can 
also be determined by hybridization under the appropriate stringency conditions. 
Hybridization techniques for determining the complementarity of nucleic acid sequences 
are well known in the art. In addition, mismatches of duplex polynucleotides formed by 
hybridization can be determined by known techniques, including for example, digestion 
with a nuclease such as SI that specifically digests single-stranded areas in duplex 
polynucleotides. 

Regions of the viral genome from which nucleic acid sequences may be 
derived include, but are not limited to, regions encoding specific epitopes as well as 
non-transcribed and non-translated sequences. Preferably, the epitope is unique to 
HIV viruses in the subtype corresponding to the subtype of the corresponding region 
of a polypeptide encoded by any one of the 1 1 clones of this invention, and whose 
uniqueness was unknown prior to the disclosure of the clones of this invention The 
uniqueness of the epitope may be determined by its immunological reactivity with 
HIV viruses of the subtype and lack of immunological reactivity with other HIV 
viruses of the other subtypes. Methods for determining immunological reactivity are 
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known in the art, e.g., radioimmunoassay and ELISA and other assays mentioned 
herein. The uniqueness of an epitope can also be determined by computer searches of 
known databases, e.g., for the polynucleotide sequences which encode the eptiope, and 
by amino acid sequence comparisons with other known proteins. 

The derived nucleic acid is not necessarily physically derived from the 
nucleotide sequence shown, but may be generated in any manner, including for 
example, chemical synthesis or DNA replication or reverse transcription or 
transcription, which are based on the information provided by the sequence of bases in 
the region(s) from which the nucleic acid is derived. The derived nucleic acid is 
comprised of at least 6-12 bases, more preferably at least 15-19 bases, more preferably 
at least 30 bases. The derived nucleic acid may also be larger, e.g., at least 100 bases 
in length, depending on the desired use of the nucleic acid. In addition, regions or 
combinations of regions corresponding to that of the designated sequence may be 
modified in ways known in the art to be consistent with an intended use. The derived 
nucleic acid may be a polynucleotide or polynucleotide analog. 

The term "recombinant nucleotide" or "recombinant nucleic acid" as 
used herein intends a nucleic acid of genomic, cDNA, semisynthetic, or synthetic 
origin which, by virtue of its origin or manipulation: (1) is not associated with all or a 
portion of the nucleic acid with which it is associated in nature; and/or (2) is linked to 
a nucleic acid other than that to which it is linked in nature. 

The term "polynucleotide" as used herein refers to a polymeric form of 
nucleotides of any length, either ribonucleotides or deoxyribonucleotides. This term 
refers only to the primary structure of the molecule. Thus, this term includes double- 
and single-stranded DNA, as well as double- and single-stranded RNA. It also 
includes modified, for example, by methylation and/or by capping, and unmodified 
forms of the polynucleotide. 

The present invention relates to nucleic acids having the genomic 
sequence of any one of the 1 1 molecular clones for non-subtype HIV-1 isolates of this 

invention as shown in Fig. 13 (SEQ ID NOS: to ), as well as fragments (or 

partial sequences) thereof The invention also relates to nucleic acids having 
complementary (or antisense) sequences to the sequences shown in Fig. 13 (SEQ ID 
NOS: to ), as well as fragments (or partial sequences) thereof Partial 
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sequences may be obtained by various methods, including restriction digestion of 

nucleic acids with sequences shown in Fig. 13 (SEQ ID NOS: to ), PCR 

amplification, and direct synthesis. Partial sequences may be all or part of the LTR 
and/or other untranslated regions of the genomes of one or more of the 1 1 viral clones 
of this invention, and/or all or part of the genes encoding the Gag, Pol, Vif, Vpr, Env, 
Tat ? Rev, Nef and Vpu proteins and/or complementary (or antisense) sequences 
thereof. Nucleic acids of the invention also include cDNA, mRNA, and other nucleic 
acids derived from the genomic sequences of one or more of these 1 1 HIV-1 clones. 
Sequences of the genes encoding Gag, Pol, Vif, Vpr, Env, Tat, Rev, Nef and Vpu are 
identified in Fig. 13. 

Genomic sequences of seven of the 1 1 clones of the invention have 
been made publicly available. The GenBank Accession numbers are as follows: 

Accession No. Sequence ID No. 
U88823 
U88825 
U88826 
AF005494 
AF005495 
AF005496 
AF049337 



Clone 
92RW009.6 
92NG003.1 
92NG083.2 
93BR020.1 
93BR029.4 
90CF056.1 
94CY032.3 
94CY017.41 
96ZM651.8 
96ZM751.3 
94IN476.104 



The nucleic acids of the invention may be present in vectors or host 
cells in tissue culture or other media. The nucleic acids of the invention may also be 
isolated and substantially purified by methods known in the art. 

Nucleic acids of about 17 bases to about 35 bases in length are 
particularly preferred for use as primers in PCR amplification (see, e.g., the primers 
UP1 A and RAJS (17mer and 22mer, respectively) and UPlAMlul and LowlMlul 
(28mer and 35mer respectively)). Nucleic acids of about 14 to about 25 bases in 
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length are particularly preferred for use in nucleotide arrays. (See, eg., ref. 108, which 
uses 20 to 25 mers). 

The present invention also relates to vectors and host cells comprising 
the nucleic acids of the invention. 

The present invention also relates to compositions comprising one or 
more of the nucleic acids, vectors, and/or host cells of the invention. 

The present invention further relates to methods of using the nucleic 
acids, vectors, and/or host cells of the invention, and/or compositions thereof. For 
example, the invention relates to the use of nucleic acids of the invention as diagnostic 
agents to detect the presence or absence of non-subtype B HIV-1 viruses in a sample. 

The present invention also relates to a method for detecting the 
presence of HIV-1 viruses which are related to the viruses of this invention in a 
mammal, using the nucleic acids of this invention. 

In one embodiment, the detection method involves analyzing DNA 
obtained from a mammal suspected of harboring HIV-1 viruses. DNA can be isolated 
by methods well known in the art. 

The methods for analyzing the DNA for the presence of the viruses of 
this invention include Southern blotting (86), dot and slot hybridization (87), and 
nucleotide arrays (see, e.g., US 5,445,934 and US 5,733,729). 

The nucleic acid probes used in the detection methods set forth above 

are derived from nucleic acid sequences shown in Fig. 13 (SEQ ID NOS: to ). 

The size of such probes is at least 10-12 bases long, more usually at least about 19 
bases long, more usually from about 200 to about 500 bases, and often exceeding 
about 1000 bases. 

The nucleic acid probes of this invention may be DNA or RNA. 
Nucleic acids can be synthesized using any of the known methods of nucleotide 
synthesis (see, e.g., refs. 88, 89, 90), or they can be isolated fragments of naturally 
occurring or cloned DNA. In addition, those skilled in the art would be aware that 
nucleotides can be synthesized by automated instruments sold by a variety of 
manufacturers or can be commercially custom ordered and prepared. The probes of 
this invention may also be nucleotide analogs, such as nucleotides linked by 
phosphodiester, phosphorothiodiester, methylphosphonodiester, or 
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methylphosphonothiodiester moieties (91) and peptide nucleic acids (PNAs), in which 
the sugar-phosphate backbone of the polynucleotide is replaced with a polyamide or 
"pseudopeptide" backbone (92). 

The nucleic acid probes can be labeled using methods known to one 
skilled in the art. Such labeling techniques can include radioactive labels, biotin, 
avidin, enzymes and fluorescent molecules (93). 

The nucleic acid probes used in the detection methods set forth above 
are derived from sequences substantially homologous to one or more of the sequences 

shown in Fig. 13 (SEQ ID NOS: to ), or their complementary sequences. By 

"substantially homologous", as used throughout the specification and claims to 
describe the nucleic acid sequence of the present invention, is meant a high level of 
homology between the nucleic acid sequence and one or more of the sequences of Fig. 

13 (SEQ ID NOS: to ), or its complementary sequence. Preferably, the level of 

homology is in excess of 80%, more preferably in excess of 90%, with a preferred 
nucleic acid sequence being in excess of 95% homologous with a portion of one or 

more of the sequences shown in Fig, 13 (SEQ ID NOS: to ), or its complement. 

The size of such probes is usually at least 20 nucleotides, more usually from about 200 
to 500 nucleotides, and often exceeding 1000 nucleotides. 

Although complete complementarity is not necessary, it is preferred 
that the probes are made completely complementary to the corresponding portion of 
the genome, mRNA or cDNA target of at least one of the 1 1 viruses of this invention. 

The probes can be packaged into diagnostic kits. Diagnostic kits may 
include ingredients for labeling and other reagents and materials needed for the 
particular hybridization protocol in addition to the probes. 

In another embodiment of the invention, the detection method 
comprises analyzing the RNA of a mammal for the presence of HIV- 1 viruses which 
are related to one or more of the 1 1 the viruses of this invention. RNA can be isolated 
by methods well known in the art. 

The methods for analyzing the RNA for the presence of the viruses of 
this invention include Northern blotting (94), dot and slot hybridization, filter 
hybridization (95), RNase protection (93), and reverse-transcription polymerase chain 
reaction (RT-PCR) (96). A preferred method is RT-PCR. In this method, the RNA 
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can be reverse transcribed to first strand cDNA using a nucleic acid primer or primers 
derived from one or more of the nucleotide sequences shown in Fig. 13 (SEQ ID 

NOS; to ). Once the cDNAs are synthesized, PCR amplification is carried out 

using pairs of primers designed to hybridize with sequences in the genomes of one or 
more of the non-subtype B HIV-1 viruses of this invention which are an appropriate 
distance apart (at least about 50 bases) to permit amplification of the cDNA and 
subsequent detection of the amplification product. Each primer of a pair is a single- 
stranded nucleic acid of about 20 to about 60 bases in length where one primer (the 
"upstream" primer) is complementary to the original RNA and the second primer (the 
"downstream" primer) is complementary to the first strand of cDNA generated by 
reverse transcriptions of the RNA. The target sequence is generally about 100 to about 
300 bases in length but can be as large as 500-1500 bases or more, e.g., 9,000 bases. 
Optimization of the amplification reaction to obtain sufficiently specific hybridization 
to the nucleotide sequences of these viruses is well within the skill in the art and is 
preferably achieved by adjusting the annealing temperature. 

The amplification products of PCR can be detected either directly or 
indirectly. In one embodiment, direct detection of the amplification products is carried 
out via labeling of primer pairs. Labels suitable for labeling the primers of the present 
invention are known to one skilled in the art and include radioactive labels, biotin, 
avidin, enzymes and fluorescent molecules. The desired labels can be incorporated 
into the primers prior to performing the amplification reaction. Alternatively, the 
desired labels can be incorporated into the primer extension products during the 
amplification reaction in the form of one or more labeled dNTPs. In one embodiment 
of the present invention, the labeled amplified PCR products can be detected by 
agarose gel electrophoresis followed by ethidium bromide staining and visualization 
under ultraviolet light or via direct sequencing of the PCR-products. The labeled 
amplified PCR products can also be detected by binding to immobilized 
oligonucleotide arrays. 

In yet another embodiment, unlabelled amplification products can be 
detected via hybridization with labeled nucleic acid probes in methods known to one 
skilled in the art, such as dot or slot blot hybridization or filter hybridization. 

The invention also relates to methods of using these nucleic acids to 
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produce polypeptides in vitro or in vivo. 

In one embodiment of the invention, a recombinant method of making a 
polypeptide of the invention comprises: 

(a) preparing a nucleic acid capable of directing a host cell to 
produce a polypeptide encoded by the genome of any one of the non- subtype B HIV-1 
viruses of this invention; 

(b) cloning the nucleic acid into a vector capable of being 
transferred into and replicated in a host cell, such vector containing operational 
elements for expressing the nucleic acid, if necessary; 

(c) transferring the vector containing the nucleic acid and 
operational elements into a host cell capable of expressing the polypeptide; 

(d) growing the host under conditions appropriate for expression of 
the polypeptide; and 

(e) harvesting the polypeptide. 

The present invention also relates to non-recombinant methods of 
making the polypeptides and nucleic acids of the invention. In addition to synthetic 
methods, the non-recombinant methods involve culturing the viruses of this invention 
in cell lines, preferably in uninfected human peripheral blood mononuclear cells, under 
conditions appropriate for expression of the polypeptides and nucleic acids. This 
invention thus also relates to the polypeptides and nucleic acids produced by the virus 
in cell culture. The polypeptides and nucleic acids may be isolated and purified by 
methods known in the art. 

The vectors contemplated for use in the present invention include any 
vectors into which a nucleic acid sequence as described above can be inserted, along 
with any preferred or required operational elements, and which vector can then be 
subsequently transferred into a host cell and, preferably, replicated in such cell. 
Preferred vectors are those whose restriction sites have been well documented and 
which contain the operational elements preferred or required for transcription of the 
nucleic acid sequence. Vectors may also be used to prepare large amounts of nucleic 
acids of the invention, which may be used, e.g., to prepare probes or other nucleic acid 
constructs. 

When expression of a polypeptide is desired, the "operational 
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elements" as discussed herein include at least one promoter sequence capable of 
initiating transcription of the nucleic acid sequence, at least one leader sequence, at 
least one terminator codon and/or termination signal, and any other DNA sequences 
necessary or preferred for appropriate transcription and subsequent translation of the 
vector nucleic acid. In particular, it is contemplated that such vectors will preferably 
contain at least one origin of replication recognized by the host cell along with at least 
one selectable marker. 

Preferred expression vectors of this invention are those which function 
in bacterial and/or eukaryotic cells. Examples of vectors which function in eukaryotic 
cells include, but are not limited to Venezuelan equine encephalitis virus vectors, 
simian virus vectors, vaccinia virus vectors, adenovirus vectors, herpes virus vectors, 
or vectors based on retroviruses, such as murine leukemia virus, or HIV or other 
lentivirus (97). 

The selected expression vector may be transfected into a suitable 
bacterial or eukaryotic cell system for purposes of expressing the recombinant 
polypeptide. Eukaryotic cell systems include but are not limited to cell lines such as 
HeLa, COS-1, 293T , MRC-5, or CV-1 cells. Primary human cells, such as lymph 
node cells, macrophages, etc., are also useful in practicing the invention. 

The expressed polypeptides may be detected directly by methods 
known in the art including, but not limited to, Coomassie blue staining and Western 
blotting or indirectly, such as in detection of the expression product of a reporter gene, 
such as luciferase. 

In another embodiment of the invention, the method comprises 
administering a composition comprising a vector comprising a nucleic acid of the 
invention to a mammal to produce a polypeptide in vivo. 

The present invention also relates to polypeptides encoded by and/or 
derived from the nucleotide sequences of this invention. These polypeptides may be 
natural, synthetic or produced by recombinant methods. Polypeptides can be obtained 
as a crude lysate or can be purified by standard protein purification procedures known 
in the art which may include differential precipitation, molecular size exclusion 
chromatography, ion-exchange chromatography, isolectric focusing, gel 
electrophoresis and affinity and immunoaffinity chromatography. The polypeptides 
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may be purified by passage through a column containing a resin which has bound 
thereto antibodies specific for an open reading frame (ORF) polypeptide. The present 
invention also relates to compositions comprising one or more of the polypeptides of 
the invention. 

A polypeptide or amino acid sequence derived from a designated 
nucleic acid sequence refers to a polypeptide having an amino acid sequence identical 
to that of a polypeptide encoded by the sequence, or a portion thereof wherein the 
portion consists of at least 6-8 amino acids ? and more preferably at least 10 amino 
acids, and more preferably at least 11-15 amino acids, and most preferably at least 30 
amino acids or which is immunologically cross-reactive with a polypeptide encoded 
by the sequence. The polypeptide may also be larger, e.g., at least 100 amino acids in 
length, depending on the desired use of the polypeptide. Polypeptides from the V3- 
loop region and the "crown" of gp41 of Env are particularly preferred. 

A recombinant or derived polypeptide is not necessarily translated from 
a designated nucleic acid sequence; it may be generated in any manner, including for 
example, chemical synthesis, or expression of a recombinant expression system, or 
isolation from any of the 1 1 fflV-1 viruses of this invention. 

It should be noted that the nucleotide sequences described herein 
represent one embodiment of the present invention. Due to the degeneracy of the 
genetic code, it is to be understood that numerous choices of nucleotides may be made 
that will lead to a sequence capable of directing production of the polypeptides set 
forth above. As such, nucleic acid sequences which are functionally equivalent to the 
sequences described herein are intended to be encompassed within the present 
invention. For example, preferred codons which are appropriate to the host cell may 
be used (see, e.g., WO 98/34640), or the sequence may be modified to reduce the 
effect of any inhibitory/instability sequences and to provide for Rev-independent gene 
expression. (98). 

The polypeptides of this invention consist of at least 6-12 amino acids, 
more preferably at least 13-18 amino acids, even more preferably at least 19-24 amino 
acids and most preferably at least 25-30 amino acids encoded by, or otherwise derived 
from, any one of the genomic sequences shown in Fig. 13 (SEQ ID NOS: to ). 

The present invention further relates to the use of polypeptides of the 
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invention as diagnostic agents. 

In one embodiment, the polypeptides of the invention can be used in 
immunoassays for detecting the presence of antibodies against non-subtype B HIV-1 
viruses in a mammal and for diagnosing the presence of infection of any of these 
viruses in a mammal. 

For the purposes of the present invention, "mammal" as used 
throughout the specification and claims, includes, but is not limited to humans, 
chimpanzees, mangabeys, other other primates. 

In a preferred embodiment, test serum is reacted with a solid phase 
reagent having a surface-bound polypeptide of this invention as an antigen. The solid 
surface reagent can be prepared by known techniques for attaching polypeptides to 
solid support material. These attachment methods include non-specific adsorption of 
the polypeptide to the support or covalent attachment of the polypeptide to a reactive 
group on the support. After reaction of the antigen with an antibody against any one 
of the viruses of this invention in the serum, unbound serum components are removed 
by washing and the antigen-antibody complex is reacted with a secondary antibody 
such as labeled anti-human antibody. The label may be an enzyme which is detected 
by incubating the solid support in the presence of a suitable fluorimetric or 
colorimetric reagent. Other detectable labels may also be used, such as radiolabels or 
colloidal gold, and the like. 

Immunoassays of the present invention may be a radioimmunoassay, 
Western blot assay, immunofluorescent assay, enzyme immunoassay, 
chemiluminescent assay, immunohistochemical assay and the like. Standard 
techniques for ELIS A are well known in the art. Such assays may be a direct, indirect, 
competitive, or noncompetitive immunoassay as described in the art {see, e.g., ref. 99). 
Biological samples appropriate for such detection assays include, but are not limited to 
serum, liver, saliva, lymphocytes or other mononuclear cells. 

Polypeptides of the invention may be prepared in the form of a kit, 
alone, or in combinations with other reagents such as secondary antibodies, for use in 
immunoassays. 

In yet another embodiment, the polypeptides of the invention can be 
used as immunogens to raise antibodies and/or stimulate cellular immunity in a 
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mammal. 

The immunogen may be a partially or substantially purified peptide. 
Alternatively, the immunogen may be a cell, cell lysate from cells transfected with a 
recombinant expression vector, or a culture supernatant containing the expressed 
polypeptide. The immunogen may comprise one or more structural proteins, and/or 
one or more non-structural proteins of the HIV-1 clones of this invention, or a mixture 
thereof. 

The effective amount of polypeptide per unit dose sufficient to induce 
an immune response depends, among other things, on the species of mammal 
inoculated, the body weight of the mammal and the chosen inoculation regimen, as 
well as the presence or absence of an adjuvant, as is well known in the art. Inocula 
typically contain polypeptide concentrations of about 1 microgram to about 50 
milligrams per inoculation (dose), preferably about 10 micrograms to about 10 
milligrams per dose, most preferably about 100 micrograms to about 5 milligrams per 
dose. 

The term "unit dose" as it pertains to the inocula refers to physically 
discrete units suitable as unitary dosages for mammals, each unit containing a 
predetermined quantity of active material (polypeptide) calculated to produce the 
desired immunogenic effect in association with the required diluent. 

Inocula are typically prepared as a solution in a physiologically 
acceptable carrier such as saline, phosphate-buffered saline and the like to form an 
aqueous pharmaceutical composition. 

The route of inoculation of the polypeptides of the invention is typically 
parenteral and is preferably intramuscular, sub-cutaneous and the like. The dose is 
administered at least once. In order to increase the antibody level, at least one booster 
dose may be administered after the initial injection, preferably at about 4 to 6 weeks 
after the first dose. Subsequent doses may be administered as indicated. 

To monitor the antibody response of individuals administered the 
compositions of the invention, antibody titers may be determined. In most instances it 
will be sufficient to assess the antibody titer in serum or plasma obtained from such an 
individual. Decisions as to whether to administer booster inoculations or to change the 
amount of the composition administered to the individual may be at least partially 
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based on the titer. 

The titer may be based on an immunobinding assay which measures the 
concentration of antibodies in the serum which bind to a specific antigen. The ability 
to neutralize in vitro and in vivo biological effects of the viruses of this invention may 
also be assessed to determine the effectiveness of the immunization. 

For all therapeutic, prophylactic and diagnostic uses, the polypeptide of 
the invention, alone or linked to a carrier, as well as antibodies and other necessary 
reagents and appropriate devices and accessories may be provided in kit form so as to 
be readily available and easily used. 

Where immunoassays are involved, such kits may contain a solid 
support, such as a membrane (e.g., nitrocellulose), a bead, sphere, test tube, microtiter 
well, rod, and so forth, to which a receptor such as an antibody specific for the target 
molecule will bind. Such kits can also include a second receptor, such as a labeled 
antibody. Such kits can be used for sandwich assays. Kits for competitive assays are 
also envisioned. 

The immunogens of this invention can also be generated by the direct 
administration of nucleic acids of this invention to a subject. DNA-based vaccination 
has been shown to stimulate humoral and cellular responses to HIV-1 antigens in mice 
(100-103) and macaques (103, 104). More recent studies in infected chimpanzees 
have shown a possible application of this strategy in HIV-1 -infected humans: DNA 
vaccination of HIV-1 -infected chimpanzees with a construct that drives expression of 
HIV-1 env and rev appeared well-tolerated, and immunized animals demonstrated a 
boost in antibody response followed by a >1 log decrease in their virus loads (104). A 
DNA-based vaccine containing HIV-1 env and rev genes was injected into HIV- 
infected human patients in three doses (30, 100 or 300 micrograms) at 10-week 
intervals. Increased antibodies against gpl20 were observed in the 100 and 300 jig 
groups. Increases were also noted in cytotoxic T lymphocyte (CTL) activity against 
gpl60-bearing targets and in lymphocyte proliferative activity (105, 106). DNA-based 
vaccines containing HIV gag genes, with modification of the viral nucleotide sequence 
to incorporate host-preferred codons {see, e.g., WO 98/34640), and/or to reduce the 
effect of inhibitory/instability sequences {see, e.g, 7 ref 98), have likewise been 
described. 
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Therefore, it is anticipated that the direct injection of RNA or DNA 
vectors of this invention encoding viral antigen can be used for endogenous expression 
of the antigen to generate the viral antigen for presentation to the immune system 
without the need for self-replicating agents or adjuvants, resulting in the generation of 
antigen-specific CTLs and protection from a subsequent challenge with a homologous 
or heterologous strain of virus. 

CTLs in both mice and humans are capable of recognizing epitopes 
derived from conserved internal viral proteins and are thought to be important in the 
immune response against viruses. By recognition of epitopes from conserved viral 
proteins, CTLs may provide cross-strain protection. CTLs specific for conserved viral 
antigens can respond to different strains of virus, in contrast to antibodies, which are 
generally strain-specific. 

Thus, direct injection of RNA or DNA encoding the viral antigen has 
the advantage of being without some of the limitations of direct peptide delivery or 
viral vectors (see, e.g., ref 107 and the discussions and references therein). 
Furthermore, the generation of high-titer antibodies to expressed proteins after 
injection of DNA indicates that this may be a facile and effective means of making 
antibody-based vaccines targeted towards conserved or non-conserved antigens, either 
separately or in combination with CTL vaccines targeted towards conserved antigens. 
These may also be used with traditional peptide vaccines, for the generation of 
combination vaccines. Furthermore, because protein expression is maintained after 
DNA injection, the persistence of B and T cell memory may be enhanced, thereby 
engendering long-lived humoral and cell-mediated immunity. 

Nucleic acids encoding a polypeptide of this invention can be 
introduced into animals or humans in a physiologically or pharmaceutical^ acceptable 
carrier using one of several techniques such as injection of DNA directly into human 
tissues; electroporation or transfection of the DNA into primary human cells in culture 
(ex vivo), selection of cells for desired properties and reintroduction of such cells into 
the body, (said selection can be for the successful homologous recombination of the 
incoming DNA to an appropriate preselected genomic region); generation of infectious 
particles containing the gag and/or other genes encoded by the viruses of this 
invention, infection of cells ex vivo and reintroduction of such cells into the body; or 
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direct infection by said particles in vivo. Substantial levels of polypeptide will be 
produced leading to an efficient stimulation of the immune system. 

Also envisioned are therapies based upon vectors, such as viral vectors 
containing nucleic acid sequences coding for the polypeptides described herein. These 
molecules, developed so that they do not provoke a pathological effect, will stimulate 
the immune system to respond to the polypeptides. 

The effective amount of nucleic acid immunogen per unit dose to 
induce an immune response depends, among other things, on the species of mammal 
inoculated, the body weight of the mammal and the chosen inoculation regimen, as is 
well known in the art. Inocula typically contain nucleic acid concentrations of about 1 
microgram to about 50 milligrams per inoculation (dose), preferably about 10 
micrograms to about 10 milligrams per dose, most preferably about 100 micrograms to 
about 5 milligrams per dose. 

Immunization can be conducted by conventional methods. For 
example, the immunogen can be used in a suitable diluent such as saline or water, or 
complete or incomplete adjuvants. Further, the immunogen may or may not be bound 
to a carrier. While it is possible for the immunogen to be administered in a pure or 
substantially pure form, it is preferable to present it as a pharmaceutical composition, 
formulation or preparation. 

The formulations of the present invention, both for veterinary and for 
human use, comprise an immunogen as described above, together with one or more 
physiologically or pharmaceutically acceptable carriers and optionally other 
therapeutic ingredients. The carrier(s) must be "acceptable" in the sense of being 
compatible with the other ingredients of the formulation and not deleterious to the 
recipient thereof. The formulations may conveniently be presented in unit dosage 
form and may be prepared by any method well-known in the pharmaceutical art. The 
immunogen can be administered by any route appropriate for antibody production 
such as intravenous, intraperitoneal, intramuscular, subcutaneous, and the like. The 
immunogen may be administered once or at periodic intervals until a significant titer 
of antibody against any of the 1 1 viruses of this invention is produced. The antibody 
may be detected in the serum using an immunoassay. The host serum or plasma may 
be collected following an appropriate time interval to provide a composition 
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comprising antibodies reactive with the virus particle or encoded polypeptide. The 
gamma globulin fraction or the IgG antibodies can be obtained, for example, by use of 
saturated ammonium sulfate or DEAE Sephadex, or other techniques known to those 
skilled in the art. 

In addition to its use to raise antibodies, the administration of the 
immunogens of the present invention may be for use as a vaccine for either a 
prophylactic or therapeutic purpose. When provided prophylactically, a vaccine(s) of 
the invention is provided in advance of any exposure to any one or more of the 1 1 non- 
subtype B viruses of this invention or in advance of any symptoms due to infection of 
these viruses. The prophylactic administration of a vaccine(s) of the invention serves 
to prevent or attenuate any subsequent infection of these viruses in a mammal. When 
provided therapeutically, a vaccine(s) of the invention is provided at (or shortly after) 
the onset of infection or at the onset of any symptom of infection or any disease or 
deleterious effects caused by these viruses. The therapeutic administration of the 
vaccine(s) serves to attenuate the infection or disease. The vaccine(s) of the present 
invention may, thus, be provided either prior to the anticipated exposure to the viruses 
of this invention or after the initiation of infection. 

In another embodiment, the polypeptides of the invention can be used 
to prepare antibodies against epitopes of the viruses of this invention that are useful in 
diagnosis. 

The term "antibodies" is used herein to refer to immunoglobulin 
molecules and immunologically active portions of immunoglobulin molecules. 
Exemplary antibody molecules are intact immunoglobulin molecules, substantially 
intact immunoglobulin molecules and portions of an immunoglobulin molecule, 
including those portions known in the art as Fab, Fab', F(ab')2 and F(v) as well as 
chimeric antibody molecules. 

An antibody of the present invention is typically produced by 
immunizing a mammal with an immunogen or vaccine of the invention. In one 
embodiment, the immunogen or vaccine contains one or more polypeptides of the 
invention, or a structurally and/or antigenically related molecule, to induce, in the 
mammal, antibody molecules having immunospecificity for the immunizing peptide or 
peptides. The peptide(s) or related molecule(s) may be monomeric, polymeric, 
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conjugated to a carrier, and/or administered in the presence of an adjuvant. In another 
embodiment, the immunogen or vaccine contains one or more nucleic acids encoding 
one or more polypeptides of the invention, or one or more nucleic acids encoding 
structurally and/or antigenically related molecules, to induce, in the mammal, the 
production of the immunizing peptide or peptides. The antibody molecules may then 
be collected from the mammal if they are to be used in immunoassays or for providing 
passive immunity. 

The antibody molecules of the present invention may be polyclonal or 
monoclonal. Monoclonal antibodies may be produced by methods known in the art. 
Portions of immunoglobulin molecules may also be produced by methods known in 
the art. 

The antibody of the present invention may be contained in various 
carriers or media, including blood, plasma, serum (e.g., fractionated or unfractionated 
serum), hybridoma supernatants and the like. Alternatively, the antibody of the 
present invention is isolated to the extent desired by well known techniques such as, 
for example, by using DEAE SEPHADEX, or affinity chromatography. The 
antibodies may be purified so as to obtain specific classes or subclasses of antibody 
such as IgM, IgG, IgA, IgGi, IgG2, IgG3, IgG* and the like. Antibody of the IgG class 
are preferred for purposes of passive protection. 

The presence of the antibodies of the present invention, either 
polyclonal or monoclonal, can be determined by, but are not limited to, the various 
immunoassays described above. 

The antibodies of the present invention have a number of diagnostic 
and therapeutic uses. The antibodies can be used as an in vitro diagnostic agent to test 
for the presence of any one or more of the 1 1 HIV-1 viruses of this invention in 
biological samples in standard immunoassay protocols. Preferably, the assays which 
use the antibodies to detect the presence of these viruses in a sample involve 
contacting the sample with at least one of the antibodies under conditions which will 
allow the formation of an immunological complex between the antibody and the viral 
antigen that may be present in the sample. The formation of an immunological 
complex if any, indicating the presence of one or more of these viruses in the sample, 
is then detected and measured by suitable means. Such assays include, but are not 
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limited to, radioimmunoassays, (RIA), ELISA, indirect immunofluorescence assay, 
Western blot and the like. The antibodies may be labeled or unlabeled depending on 
the type of assay used. Labels which may be coupled to the antibodies include those 
known in the art and include, but are not limited to, enzymes, radionucleotides, 
fluorogenic and chromogenic substrates, cofactors, biotin/avidin, colloidal gold and 
magnetic particles. Modification of the antibodies allows for coupling by any known 
means to carrier proteins or peptides or to known supports, for example, polystyrene or 
polyvinyl microtiter plates, glass tubes or glass beads and chromatographic supports, 
such as paper, cellulose and cellulose derivatives, and silica. 

Such assays may be, for example, of direct format (where the labeled 
first antibody reacts with the antigen), an indirect format (where a labeled second 
antibody reacts with the first antibody), a competitive format (such as the addition of a 
labeled antigen), or a sandwich format (where both labeled and unlabelled antibody 
are utilized), as well as other formats described in the art. In one such assay, the 
biological sample is contacted with antibodies of the present invention and a labeled 
second antibody is used to detect the presence of any one of the fflV-1 viruses of this 
invention, to which the antibodies are bound. 

The antibodies of the present invention are also useful as a means of 
enhancing the immune response. 

The antibodies may be administered with a physiologically or 
pharmaceutical^ acceptable carrier or vehicle therefor. A physiologically acceptable 
carrier is one that does not cause an adverse physical reaction upon administration and 
one in which the antibodies are sufficiently soluble and retain their activity to deliver a 
therapeutically effective amount of the compound. The therapeutically effective 
amount and method of administration of the antibodies may vary based on the 
individual patient, the indication being treated and other criteria evident to one of 
ordinary skill in the art. A therapeutically effective amount of the antibodies is one 
sufficient to reduce the level of infection by one or more of the viruses of this 
invention or attenuate any dysfunction caused by viral infection without causing 
significant side effects such as non-specific T cell lysis or organ damage. 

The route(s) of administration useful in a particular application are 
apparent to one or ordinary skill in the art. Routes of administration of the antibodies 
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include, but are not limited to, parenteral, and direct injection into an affected site. 
Parenteral routes of administration include but are not limited to intravenous, 
intramuscular, intraperitoneal and subcutaneous. 

The present invention includes compositions of the antibodies 
described above, suitable for parenteral administration including, but not limited to, 
pharmaceutically acceptable sterile isotonic solutions. Such solutions include, but are 
not limited to, saline and phosphate buffered saline for intravenous, intramuscular, 
intraperitoneal, or subcutaneous injection, or direct injection into a joint or other area. 

Antibodies for use to elicit passive immunity in humans are preferably 
obtained from other humans previously inoculated with pharmaceutical compositions 
comprising one or more of the polypeptides of the invention. Alternatively, antibodies 
derived from other species may also be used. Such antibodies used in therapeutics 
suffer from several drawbacks such as a limited half-life and propensity to elicit an 
immune response. Several methods are available to overcome these drawbacks. 
Antibodies made by these methods are encompassed by the present invention and are 
included herein. One such method is the "humanizing" of non-human antibodies by 
cloning the gene segment encoding the antigen binding region of the antibody to the 
human gene segments encoding the remainder of the antibody. Only the binding 
region of the antibody is thus recognized as foreign and is much less likely to cause an 

immune response. 

In providing the antibodies of the present invention to a recipient 
mammal, preferably a human, the dosage of administered antibodies will vary 
depending upon such factors as the mammal's age, weight, height, sex, general 
medical condition, previous medical history and the like. 

In general, it is desirable to provide the recipient with a dosage of 
antibodies which is in the range of from about 5 mg/kg to about 20 mg/kg body weight 
of the mammal, although a lower or higher dose may be administered. In general, the 
antibodies will be administered intravenously (IV) or intramuscularly (EVI). 

The invention also relates to the use of antisense nucleic acids to inhibit 
translation of peptides encoded by the HIV-1 viruses of this invention. The antisense 
nucleic acids are complementary to the viral mRNAs encoding peptides of this 
invention. The antisense nucleic acids may be in the form of synthetic nucleic acids or 
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they may be encoded by a nucleotide construct, or they may be semi-synthetic. The 
antisense nucleic acids may be delivered to the cells using methods known to those 
skilled in the art. 

Kits designed for diagnosis of the fflV-1 viruses of this invention in a 
biological sample can be constructed by packaging the appropriate materials, including 
the nucleic acids and/or polypeptides of this invention and/or antibodies which 
specifically react with antigens of one or more of these viruses, along with other 
reagents and materials required for the particular assay. 

The present invention further relates to computer-generated alignments 

of any one or more of the nucleotide sequences shown in Fig. 13 (SEQ ID NOS: to 

). Computer analysis of the nucleotide sequences, such as the one shown in Fig. 

13, can be carried out using commercially available computer program known to one 
skill in the art. 

In one embodiment, the sequences shown in Fig. 13 (SEQ ID NOS: 

to ) are aligned by the computer program CLUSTAL (67) and adjusted with 

multiple-aligned sequence editor (12). The computer analysis results in the 
distribution of 1 1 sequences into various genotypes. Five of these sequences represent 
non-recombinant members of fflV-1 subtypes, and the other six sequences represent 
HIV-1 intersubtype recombinants. 
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The grouping of the molecular clones into mosaic and non-mosaic 
genotypes is shown below: 

Name of Clone Genotyp es 
94CY0 17.41 A/? 
94IN476.104 C 
96ZM651.8 C 
96ZM751.3 C 
93BR020.1 F 
90CF056.1 H 
92RW009.6 A/C 
92NG083.2 A/G 
92NG003.1 A/G 
93BR029.4 B/F 
94CY032.3 A/G/I 

For those sequences representing recombinant members of HIV- 1, a 
variety of phylogenetic methods were used to further characterize the subtype 
composition. 

The multiple computer-generated alignments of nucleotide sequences 
are shown in Figure 13. The multiple computer-generated alignments of encoded 
amino acid sequences are shown in Figures 14-22. These alignments serve to 
highlight regions of homology and non-homology between different sequences and 
hence, can be used by one skilled in the art to design oligonucleotides and 
polypeptides useful as reagents in diagnostic assays for fflV-1. 

The following examples illustrate certain embodiments of the present 
invention, but should not be construed as limiting its scope in any way. Certain 
modifications and variations will be apparent to those skilled in the art from the 
teachings of the forgoing disclosure and the following examples, and these are 
intended to be encompassed by the spirit and scope of the invention. 
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EXAMPLE 1 
Materials and Methods 

Virus isolates 

All viruses used were propagated in normal donor peripheral blood 
mononuclear cells (PBMCs) and thus represent primary isolates. Their biological 
phenotype (SI/NSI), year of isolation, relevant epidemiological and clinical 
information, as well as appropriate references are summarized in Table 1 For 
consistency, isolates are labeled according to WHO nomenclature (28). Preliminary 
subtype classification was made on the basis of partial env and/or gag gene sequences 
(1,17,19,43). 

Amplification of near complete HIV-1 genomes using long PCR methods 

(Near) full length HIV-1 genomes were amplified from short-term 
cultured PBMC DNA essentially as described (18,56) using the GeneAmp XL kit 
(Perkin Elmer Cetus, Foster City, Calif) and primers spanning the tRNA primer 
binding site (upstream primer UP1 A: 5'-AGTGGCGCCCGAACAGG-3') (SEQ ID 

NO: ) and the R/U5 junction in the 3' long terminal repeat (downstream primer 

Low2: 5 ? -TGAGGCTTAAGCAGTGGGTTTC-3 ') (SEQ ID NO: ) Some isolates 

were amplified with primers containing Mlul restriction enzyme sites to facilitate 
subsequent subcloning into plasmid vectors (upstream primer UPlAMlul: 5'- 

TCTCTacgcgtGGCGCCCG AAC AGGGAC-3 ' (SEQ ID NO: ); downstream 

primer LowlMlul: 5'- ACCAGacgcgtACAACAGACGGGCACACACTA-CTT-3' 

(SEQ ID NO: ); lower case letters indicate the Mlul restriction site). Whenever 

possible, PBMC DNAs were diluted prior to PCR analysis to attempt amplification 
from single proviral templates. Cycling conditions included a hot start (94°C, 2 min), 
followed by 20 cycles of denaturation (94°C; 30 sec) and extension (68°C; 10 min), 
followed by 17 cycles of denaturation (94°C; 30sec) and extension (68°C, lOmin) 
with 15 second increments per cycle. PCR products were visualized by agarose gel 
electrophoresis and subcloned into pCRII by T/A overhang or following cleavage with 
Mlul into a modified pTZ18 vector (pTZ18Mlul) containing a unique Mlul site in its 
polylinker. Transformations were performed in INVaF cells, and colonies were 
screened by restriction enzyme digestion for full length inserts (transformation 
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efficiencies were generally poor, yielding only a few recombinant colonies; however, 
once subcloned, foil length genomes were stable in their respective vectors). One full 
length clone per isolate was randomly chosen for subsequent sequence analysis. 

Construction of a full length and infectious molecular clone of 94UG1 14.1 

A 674 bp fragment spanning most of the viral LTR (lacking 1-92 of U3 
sequences) as well as the untranslated leader sequence preceding gag, was amplified 
from 94UG1 14 PBMC DNA, using primers and conditions described previously (18). 
After sequence confirmation, this LTR fragment was cloned into the pTZ18Mlul 
vector, which was subsequently cleaved with Nar\ (in the primer binding site) and 
Mlu\ (in the polylinker) to allow the insertion of the 94UG1 14.1 long PCR product 
cleaved with the same restriction enzymes. The resulting plasmid clone comprised a 
full length 94UG114.1 genome with 3' and 5' LTR fragments containing all regulatory 
elements necessary for viral replication. A similar strategy could be used to construct 
replication competent genomes for all 1 1 clones reported in this application. 

Sequence analysis of HIV- 1 genomes 

A number of the clones described herein were sequenced using the 
shotgun sequencing approach (37). Briefly, viral genomes were released from their 
respective plasmid vectors by cleavage with the appropriate restriction enzymes, 
purified by gel electrophoresis, and sonicated (Model XL2020 Sonicator; Heat System 
Inc., Framingdale, N.Y.) to generate randomly sheared DNA fragments 600-1,000 bp 
in length. Following purification by gel electrophoresis, fragments were end-repaired 
using T4 DNA polymerase and Klenow enzyme and ligated into Smal digested and 
dephosphorylated Ml 3 or pTZ18 vectors. Approximately 200 shotgun clones were 
sequenced for each viral genome using cycle sequencing and dye terminator 
methodologies on an automated DNA sequenator (Model 377A; Applied Biosystems, 
Inc.). Sequences were determined for both strands of DNA. Other clones were 
sequenced directly using the primer walking approach (primers were designed 
approximately every 300 bp along the genome for both strands). Pro viral contigs were 
assembled from individual sequences using the SEQUENCHER program (Gene Codes 
Corporation, Ann Arbor, Mich.). Sequences were analyzed using EUGENE (Baylor 
College of Medicine, Houston, TX) and MASE (12). 
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Phylogenetic tree analysis 

Phylogenetic relationships of the newly derived viruses were estimated 
from sequence comparisons with previously reported representatives of HIV-1 group 
M (45). Multiple gag and env sequence alignments were obtained from the Los 
Alamos sequence database (http://hiv-web.lanl.gov/HTML/alignments.html). Newly 
derived gag and env sequences were added to these alignments using the CLUSTAL 
W profile alignment option (67) and adjusted manually using the alignment editor 
MASE (12), All partial sequences were removed from these alignments. Sites where 
there was a gap in any of the remaining sequences, as well as areas of uncertain 
alignment, were excluded from all sequence comparisons. Pairwise evolutionary 
distances were estimated using Kimura's two parameter method to correct for 
superimposed substitutions (26). Phylogenetic trees were constructed using the 
neighbor-joining method (55), and the reliability of topologies was estimated by 
performing bootstrap analysis using 1,000 replicates (13). NJPLOT was used to draw 
trees for illustrations (49). Phylogenetic relationships were also determined using 
maximum-parsimony (with repeated randomized input orders; ten iterations) as well as 
maximum-likelihood approaches, implemented using the programs DNAPARS and 
DNAML from the PHYLEP package (14). 

Complete genome alignment 

All newly derived HIV-1 genome sequences were aligned with 
previously reported (45) full length representatives of HIV-1 subtype A (U445), B 
(LAI, RF, OYI, MN, SF2), C (C2220), D (ELI, NDK, Z2Z6), and "E" (90CF402.1, 
93TH253.3, CM240) as well as SIVcpzGAB as an outgroup using the CLUSTAL W 
(67) profile alignment option (the alignment includes the untranslated leader sequence, 
gag, pol, vif f vpr, tat, rev, vpu, env, nef and available 3' LTR sequences). Sequences 
that needed to be excluded from any particular analysis were removed only after gap- 
tossing was performed on the complete alignment containing all sequences. This 
ensured that all positions were comparable in different runs with different sequences. 

Diversity plots 

The percent diversity between selected pairs of sequences was 
determined by moving a window of 500 bp in 10 bp increments along the genome 
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alignment The divergence values for each pairwise comparison were plotted at the 
midpoint of the 500 bp segment. 

Bootstrap plots 

Bootscanning was performed on neighbor-joining trees using 
SEQBOOT, DNADIST (using Kimura's correction), NEIGHBOR and CONSENSUS 
from the PHYLIP package (14) for a window of 500 bp moved along the alignment in 
increments of 10 bp. 1000 replicates were evaluated for each phylogeny. The program 
ANALYZE from the bootscanning package (57) was used to examine the clustering of 
the putative hybrid with representatives of the subtypes presumed to have been 
involved in the recombination event. The bootstrap values for these sequence were 
plotted at the midpoint of each window. 

Exploratory tree analysis 

Exploratory tree analysis was performed using the bootstrap plot 
approach described above, except in this case an increment of 100 bp was used and 
each neighbor-joining tree was viewed using DRAWTREE from the PHYLIP package 
(14). In addition, all full length sequences (except known recombinants) were included 
into the analysis. 

Informative site analysis 

To estimate the location and significance of cross-overs, each putative 
hybrid sequence was compared with a representative of each of the two subtypes 
inferred to have been involved in the recombination event, and an appropriate 
outgroup. Recombination breakpoints were mapped by examining the linear 
distribution of phy logenetically informative sites supporting the clustering of the 
hybrid with each of the two "parental" subtypes, essentially as described (52,53). 
Potential breakpoints were inserted between each pair of adjacent informative sites, 
and the extent of heterogeneity between the two sides of the breakpoint, with respect 
to numbers of the two kinds of informative site, was calculated as a 2 x 2 chi square 
value; the likely breakpoint was identified as that which gave the maximal chi-square 
value. Since the alignments contained more than one putative cross-over, this analysis 
was performed looking for one and two breakpoints at a time, and repeated on 
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subsections of the alignment defined by breakpoints already identified. To assess the 
probability of obtaining (by chance) chi-square values as high as those observed, 
10,000 random permutations of the informative sites were examined 

Nucleotide sequence accession numbers 

GenBank accession numbers for several of the (near) full length HIV-1 
proviral sequences disclosed in this application are listed in Table 2, and are hereby 
incorporated by reference. 

EXAMPLE 2 
Identification of non-subtvpe B HIV-1 viruses 

Molecular cloning of non-subtype B HIV-1 isolates 

Of the geographically diverse HIV-1 isolates described herein, five had 
previously been classified as members of (group M) subtypes A (92RW009), F 
(92BR020, 92BR029), and G (92NG003, 92NG083) on the basis of env (17,19) and/or 
gag sequences (1). One (90CF056) was chosen because it originated from a major 
epicenter of the African AIDS epidemic. In addition, 90CF056 was of interest because 
it did not fall into any known subtype at the time of its first genetic characterization 
(43). Isolates from Zambia (96ZM651 and 96ZM751) and India (94IN476) were 
chosen because of the known subtype C prevalence in those countries. The two 
isolates from Cyprus (94CY017 and 94CY032) were selected because of the extensive 
diversity of HIV-1 in the drug user population (29). Table 1 summarizes available 
demographic and clinical information, as well as biological data concerning the isolate 
phenotype (SI/NSI). Only viruses grown in normal donor PBMCs were selected for 
analysis. 
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The viral genomes were cloned by long PCR methods using primers 
homologous to the tRNA primer binding site (upstream primer) and the 
polyadenylation signal in the 3* LTR (downstream primer). This amplification 
strategy generated (near) full length genomes containing all coding and regulatory 
regions, except for 70 to 80 bps of 5* unique LTR sequences (U5). All isolates, 
regardless of subtype classification, yielded long PCR products with the same set of 
primer pairs. In some instances, genomes were amplified with primers containing 
Mlu\ restriction enzyme sites. This greatly facilitated subsequent subcloning into a 
plasmid vector (Table 2). 

Sequence analysis of (near) full length HIV-1 genomes 

All eleven HIV-1 genomes were sequenced in their entirety using either 
shotgun sequencing or primer walking approaches. The long PCR derived clones 
ranged in size from 8,952 to 8,999 base pairs, and spanned the genome from the 
primer binding site to the RAJS junction of the 3' LTR. Inspection of potential coding 
regions revealed that all clones contained the expected reading frames for gag, pol, vif 
vpr, tat, rev, vpu, env and nef. In addition, all major regulatory sequences, including 
promoter and enhancer elements in the LTR, the packaging signal, splice sites, etc., 
appeared to be intact. None of the genomes had major deletions or rearrangements, 
although inspection of the deduced protein sequences identified inactivating mutations 
in seven of the eleven clones (Table 2). However, most of these were limited to point 
mutations in single genes and were thus amenable to repair. Only two genomes 
(92NG003.1 and 92NG083.2) contained stop codons, small deletions and frameshift 
mutations in several genes, rendering them multiply defective. Importantly, no 
inactivating mutations were identified in 93BR020.1 (subtype F), 90CF056.1 (subtype 
H), and 96ZM651.8 (subtype C), suggesting that these clones encoded biologically 
active genomes (Table 2). Nucleic acids containing repaired coding sequences, as 
well as the polypeptides encoded by the repaired coding sequences, are also 
considered to be a part of the invention. 
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EXAMPLE 3 
Phylogenetic analyses in gap and env regions 
To determine the phylogenetic relationships of the viruses described 
herein, evolutionary trees from full length gag and env sequences were first 
constructed. This was done to confirm the authenticity of previously characterized 
strains, classify the new viruses, and compare viral branching orders in trees from two 
genomic regions. The results confirmed a broad subtype representation among the 
selected viruses (Fig. 1). Strains fell into six of the seven major (non-B) clades, 
including three for which full length sequences are not available (i.e., F, G and H). 
However, comparison of the gag and env topologies also identified three strains with 
discordant branching orders, 92RW009.6 grouped with subtype C viruses in gag, but 
with subtype A viruses in env. Similarly, 93BR029.4 clustered with subtype B viruses 
in gag, but with subtype F viruses in env. 94CY0 17.41 appeared to cluster within 
subtype A viruses in env, but fell into an unknown subtype in gag. However, 
characterization of the latter strain is still ongoing. These different phylogenetic 
positions were supported by high bootstrap values and thus indicated that these strains 
were intersubtype recombinants. 

EXAMPLE 4 
Diversity plots 

To characterize the putative recombinants as well as the other strains in 
regions outside gag and env, pairwise sequence comparisons with available full length 
sequences from the database were performed. A multiple genome alignment was 
generated which included the new sequences as well as U455 (subtype A), LAI, RF, 
OYI, MN and SF2 (subtype B), C2220 (subtype C), ELI, NDK and Z2Z6 (subtype D), 
and 90CF402.1, 93TH253.3 and CM240 ("subtype E"). The percent nucleotide 
sequence diversity between sequence pairs was then calculated for a window of 500 bp 
moved in steps of 10 bp along the alignment. Importantly, distance values were 
calculated only after all sites with a gap in any of the sequences were removed from 
the alignment. This ensured that all comparisons were made across the same sites. 
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Fig. 2 depicts selected distance plots for the newly characterized 
viruses. For example in Fig. 2A, 93BR020.1 (putative subtype F) is compared to 
U455 (subtype A), NDK (subtype D), C2220 (subtype C) and 90CF056.1 (putative 
subtype H). The resulting plots all exhibit very similar diversity profiles characterized 
by alternating regions of sequence variability and conservation (values range from 7% 
divergence near the 5' and 3' ends of pol, to 30% in the segment of env encoding the 
V3 region). Moreover, the four plots are virtually superimposable, indicating that 
92BR020. 1 is roughly equidistant from U455, NDK, C2220 and 90CF056. 1 over the 
entire length of its genome. A very similar set of distance curves was also obtained 
from comparisons of 94CY017.41 with 90CF056.1, 92BR025.8, 93BR020.1, U455, 
and NDK (Fig. 2B), and from comparisons of both 93BR020J and 90CF056.1 with 
representatives of subtype B and "E" (data not shown). These results indicating that 
93BR020.1 and 90CF056.1 are equidistant from each other as well as from members 
of subtypes A, B, C, D and "E M , together with the gag and env phylogenetic trees (Fig. 
1), suggest that 93BR020.1 and 90CF056.1 represent non-recombinant members of 
subtypes F and H, respectively. 

Very similar data were also obtained when 90CF056. 1 was subjected to 
diversity plot analysis using the same set of reference sequences (Fig. 2F). Again, 
distance curves exhibited very similar profiles indicating approximate equidistance 
among the strains analyzed, except when viruses from the same subtype were 
compared. For example, in Fig. 2C distances between 94ENT476.104 (putative subtype 
C) and U455, 93BR020.1, 90CF056.1, NDK and 92BR025.8, respectively, are 
depicted. As expected, the 92BR025.8 (putative subtype C) plot falls clearly below all 
others, indicating the lower level of sequence divergence between viruses from the 
same subtype (ranging from about 4% in pol to about 17% in env). Importantly, 
however, inter- and intra- diversity plots follow each other very closely, i.e., the same 
genomic regions exhibit proportionally higher and lower levels of divergence. See 
also the diversity plot analysis for 92ZM65L8 (Fig. 2G) and 96ZM751.3 (Fig. 2H). 
Thus, both at the level of inter- and intra-subtype comparisons, there was no evidence 
of mosaicism in the genomes of these three viruses. Together with the results in Fig. 
1, this suggests that strains 94IN476.104, 96ZM651.8 and 96ZM75L3 represent non- 
mosaic members of subtype C. 
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By contrast, the diversity plots of the putative recombinants 
92RW009.6 (Fig. 2D) and 93BR029.4 (Fig. 21) exhibited disproportionate levels of 
sequence divergence from different subtypes along their genome, consistent with their 
discordant branching orders in gag and env trees. As shown in Fig. 2D, 92RW009.6 is 
most similar to the subtype C strain C2220 in the 5* half of gag, most of pol, vif vpr, 
as well as nef (the C2220 curve falls below all others). However, in the 3* end of gag, 
the 5' end of and most of env, 92RW009.6 is most similar to the subtype A strain 
U455 (the U455 curve falls below all others). Similarly in Fig. 21, 93BR029.4 is most 
similar to the subtype B strain LAI in gag, pol and vpr, while it is most similar to the 
putative subtype F strain 93BR020. 1 in vif env and nef regions. In each case, the 
magnitude of the difference between the new sequence and the most similar subtype 
was no greater than the diversity seen within subtypes. Thus, these data suggest that 
92RW009.6 and 93BR029.1 represent mosaics, comprised of subtypes AJC and B/F, 
respectively. In each case, the plots suggested several (at least four) cross-overs; these 
are the minimum number of recombination breakpoints, since the window size used 
makes it unlikely that recombinant regions shorter than 500 bp would be detected. 

Finally, inspection of the diversity plots for 92NG003.1 (Fig. 2 J) and 
92NG083.2 (Fig. 2E) also revealed disproportionate levels of sequence variation, 
although not as pronounced as for 92RW009.6 and 93BR029A Isolates 92NG003.1 
and 92NG083.2 are equidistant from members of subtypes A-F and H for the most part 
of their genome, suggesting that they represent an independent subtype, i.e., subtype 
G. However, in the vif/vpr region the U455 distance plot falls below all others, 
suggesting a disproportionately closer relationship to subtype A. Assuming that U455 
is non-mosaic, these results suggest that both 92NG003.1 and 92NG083.2 contain 
short fragments of subtype A sequence in the central region of their genome. 

EXAMPLE 5 
Exploratory tree analyses 
To examine the phylogenetic position of the newly derived strains 
relative to each other and to the reference sequences over the entire genome, 
exploratory tree analyses were performed using the same multiple genome alignment 
generated for the diversity plots (Fig. 3). A total of 79 trees were constructed for 



427568_1 



-44- 

overlapping fragments of 500 bp, moved in 100 bp increments along the alignment. 
As expected, four genomes were identified that clustered in different subtypes in 
different parts of their genome. These included 93BR029.4 which alternated between 
subtypes F and B, 92RW009.6 which alternated between subtypes A and C, and 
92NG083.2 and 92NG003.1 which grouped either independently or within subtype A. 
Interestingly, the latter two strains exhibited distinct patterns of mosaicism. In trees 
spanning the region 3501-4000, 92NG003.1 clustered within subtype A, while 
92NG083.2 clustered independently, presumably representing subtype G, In contrast 
to these strains, there was no evidence for a hybrid genome structure in 94IN476. 104, 
96ZM651.8, 96ZM75L3, 93BR020.1 or 90CF056.1. These viruses branched 
consistently in all regions analyzed. Based on these findings and the results from the 
diversity plots, it appeared that five of the eleven selected HIV-1 strains represent non- 
recombinant reference strains for subtypes C (94IN476.104, 96ZM651.8, 
96ZM75L3), F (93BR020.1) and H (90CF056.1), respectively, while at least five are 
intersubtype recombinants. CY0 17.41 may be recombinant, but work is in progress in 
this regard. 

EXAMPLE 6 
Recombination breakpoint analysis 
To map the location of the recombination breakpoints in 92RW009.6 
and 93BR029.4, bootstrap plots and informative site analyses were used (18,52,53). 
Unrooted trees were constructed which included U455, 92UG037. 1, LAI, MN, OYI, 
SF2, RF, C2220, 92BR025.1, NDK, ELI, Z2Z6, 93BR020.1 and 90CF056.1; then the 
magnitude of the bootstrap values supporting (i) the clustering of 92RW009.6 with 
members of subtype A (U455, 92UG037.1) or C (2220, 92BR025.8), as well as (ii) the 
clustering of 93BR029.4 with members of subtype B (LAI, MN, OYI, MN, RF) or F 
(92BR020.1) was determined (in the latter case subtype D viruses were excluded 
because of their known close relationship to subtype B viruses). Fig. 4 depicts the 
results of 797 such phylogenetic analyses generated for each genome, performed on a 
window of 500 nucleotides moved in steps of 10 nucleotides. Very high bootstrap 
values (> 80%) supporting the clustering of 92RW009.6 with subtype C were apparent 
in gag y the 3' two-thirds of pol, and nef. By contrast, significant branching of 
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92RW009.6 with subtype A was apparent in the gaglpol overlap and the env region. 
In a small region (4,000 to 4,200) in the middle of the genome, 92RW009.6 appeared 
not to cluster significantly with either subtype, but further inspection revealed that this 
was due to a small number of informative sites. These data thus indicated four points 
of recombination crossovers between subtypes A and C (Fig. 4A). A similar analysis 
identified six recombination breakpoints between subtypes B and F in 93BR029.4 
(Fig. 4B). These included two more (in gag) than were apparent from the diversity 
plot analysis (compare Fig. 2), indicating a greater sensitivity of this approach. 

To map the recombination cross-over points in 92RW009.6 and 
93BR029. 1 more precisely, the distribution of phylogenetically informative sites 
supporting alternative tree topologies were examined (52,53). Briefly, this was done 
in a four sequence alignment which included the query sequence, a representative of 
each of the two subtypes presumed to have been involved in the recombination event, 
and an outgroup. Breakpoints were identified by looking for statistically significant 
differences in the ratios of sites supporting one topology versus another. Consistent 
with the bootscanning data, this analysis identified four breakpoints in 92RW009.6, 
and six in 93BR029.4 (Table 3). A schematic representation of the mosaic genomes of 
92RW009.6 and 93BR029.4 is depicted in Figure 6. 
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Table 3. Informative site analysis of 92RW009.6 and 93BR029.4 



Clone 


Region* 


Subtype 




Informative 
Sites 










subtype A 
(U455) 


subtype C 
(C2220) 


outgroup 
(NDK) 


92RW009.6 


1-1037 


C 


8 


32 


8 




1085-1940 


A 


17 


5 


4 




1986-5288 


C 


18 


99 


27 




5293-7238 


A 


60 


9 


13 




7254-8431 


C 


12 


55 


12 








subtype B 
(LAI) 


subtype F 
(93BR020) 


outgroup 
(C2220) 


93BR029.4 


1-735 


B 


18 


6 


3 




755-896 


F 


1 


10 


0 




930-4247 


B 


99 


10 


14 




4340-4668 


F 


2 


15 


1 




4787-5166 


B 


15 


0 


5 




5244-8242 


F 


15 


139 


13 




8250-8429 


B 


13 


0 


0 



# Numbers mark positions in the four sequence alignment which includes the untranslated leader 
sequence (1-120), gag (121-1537), pol (1370-4340), v/f (4285-4856), vpr (4799-5073), the first tat exon 
(5054-5271), vpu (5276-5488), env (5406-7726), /?e/(7727-83 13) and the 3' LTR (7991-8468). Note 
that position 8468 does not correspond to the end of the LTR but is the last position in the alignment 
after gaps have been tossed. The 5' LTR is not included in the alignment. 



Because of the lack of a full length subtype G reference sequence, 
recombination breakpoint analysis of 92NG003.1 and 92NG083.2 required a different 
approach. The analyses summarized in Fig. 2 and Fig. 3 suggested that these two 
viruses contained subtype A sequences in the middle of their genome. To attempt to 
confirm this, and to define the extent of these putative subtype A fragments, a more 
detailed diversity plot analysis of the viral middle region (between position 3,000 and 
6,000) was performed using different viral strains and varying window sizes (ranging 
from 200 to 400 bp) to examine the extent of sequence divergence of 92NG083.2 and 
92NG003.1 from members of other subtypes, including subtype A. Diversity plots for 
92NG003.1 compared to U455, C2220, NDK and 92NG083.2 and for 92NG083.2 
compared to U455, C2220, NDK and 92NG003.1 depicted representative results 
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(using a window size of 300 bp moved in steps of 10 bp along the alignment) (data not 
shown). Similar to the data shown in Fig. 2, the two "subtype G M viruses are roughly 
equidistantly related to members of subtypes A (U455), C (C2220), and D (NDK), 
except for two regions in 92NG003.1 and one region in 92NG083.2 where both 
viruses are disproportionately more closely related to U455 than they are to each other. 
Noting the points at which the "G"-A distance increases or decreases relative to the 
others allowed the tentative identification of recombination breakpoints. For example, 
at position 3400, the U455 plot falls whereas the C2220, NDK and 92NG083.2 plots 
do not, and around site 3600 the U455 plot crosses the 92NG083.2 plot. Bearing in 
mind the window size of 300 nucleotides, this finding suggested that a recombination 
cross-over occurred around position 3500. Similar "G"-A plot crossings around 
positions 3800, 4200 and 5200 (in the diversity plot for 92NG003.1), and around 
positions 4200 and 4800 (in the diversity plot for 92NG083.2), suggested additional 
recombination breakpoints. 

Phylogenetic trees were then constructed using the regions of sequence 
defined by these putative breakpoints (Fig. 5). This analysis generally supported the 
conclusions drawn from the diversity plots, i.e., 92NG003.1 clustered with subtype A 
viruses in the region between 3501 and 3800, whereas 92NG083.2 did not; and both 
92NG003. 1 and 92NG083.2 clustered with subtype A viruses in the region 4201 and 
4800. However, neither the diversity plot nor the tree analysis allowed the definition 
of the boundaries of the subtype A fragments with certainty. Nevertheless, the data 
indicated that (i) both 92NG083.2 and 92NG003 A represent G/A recombinants, (ii) 
that they are the result of different recombination events because some of their 
breakpoints are clearly different, and (hi) that 92NG083.2 likely encodes a non- 
recombinant pol gene. A schematic representation of the mosaic genomes of 
92NG083.2 and 92NG003.1 is shown in Fig. 6. 

EXAMPLE 7 
Subtype specific genome features 
Having classified the new viruses with respect to their subtype 
assignments, their sequences were examined for clade-specific signature sequences. 
Comparing deduced amino acid sequences gene by gene, several subtype specific 
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features were found (Fig. 7). For example, most subtype D viruses contain an in- 
frame stop codon in the second exon of tat, which removes 13 to 16 amino acids from 
the carboxy terminus of the Tat protein (Fig. 7 A). Similarly, all subtype C viruses 
(including 94IN476.104, 96ZM651.8 and 96ZM751.3) contain a stop codon in the 
second exon of rev which would be predicted to shorten this protein by 16 amino acids 
(Fig. 7B). Subtype C viruses also contain a 15 base pair insertion at the 5' end of the 
vpu gene (Fig. 7C) which extends the putative membrane spanning domain of the Vpu 
protein by 5 amino acids (data not shown). Although these changes are unlikely to 
alter the function of the respective gene products in a major way (e.g., the known 
functional domains of both Tat and Rev proteins are not affected by these changes), it 
is possible that they could influence their mechanism of action in a subtle (but 
nevertheless biologically important) manner. 

Of the eleven non- subtype B clones identified herein, phylogenetic 
analysis identifies five of these viruses as non-recombinant members of subtypes C 
(three), F and H, which increases the number of non-subtype B reference strains 
available. Among these, the (near) full length genomes of 93BR020. 1 and 90CF056. 1 
represent the first such strains for subtypes F and H, respectively. Five of the other 
viruses were found to represent complex mosaics of subtypes A and C, A and G (two), 
B and F and A, G and I. One, 94CY017.41, is not yet fully characterized. Both A/G 
recombinants originated from Nigeria, but must have arisen from independent 
recombination events since they are not closely related and differ in their patterns of 
mosaicism. One of these (92NG083.2) appears to contain only a single short (perhaps 
600bp) segment of subtype A origin in the vif/vpr region, and in the absence of (as yet) 
any full length subtype G virus, thus serves as a (non-mosaic) subtype G 
representative for gag, pol, env, and «e/regions. Importantly, the genomes were 
generated in such a way that they can be tested for biological activity following a 
simple reconstruction step. An example of such a reconstructed genome giving rise to 
replication competent virus (94UG1 14.1) demonstrates that this approach is feasible. 
See "Materials and Methods," supra, and the schematic diagram in Figure 8. 

Given the apparent prevalence of mosaic viruses, it is clear that subtype 
specific reference strains can only be defined as such after comprehensive 
recombination analysis. Small subgenomic fragments or even full length gag and env 
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sequences are not sufficient to identify all hybrid genomes. Although multiple cross- 
overs are a characteristic feature of retroviral recombination and have been found in 
many of the mosaic HIV-1 genomes examined (7,19,53,60,62), the examples of 
92NG003.1 and 92NG083.2 demonstrate that cross-overs may be confined to regions 
outside of gag and env. Thus, elimination of the possibility that a virus is recombinant 
requires the determination of substantial (if not all) portions of its genome. As a 
consequence, subtype specific reference reagents, such as immunogens for cross-clade 
CTL and neutralization assays, should be derived from viral isolates for which a 
complete genome has been characterized. 

These considerations emphasize the need for detailed analyses using 
reliable methods for identification of recombinant viral sequences. The above results 
indicate that diversity plots, depicting the distance between the query sequence and a 
set of reference sequences in moving windows along the genome, represent an 
excellent initial screening tool The extent of sequence divergence (between any pair 
of viruses) varies along the genome, but since all plots are shown in the same graph, 
particular regions where the query sequence is anomalously highly similar to (or 
divergent from) other sequences can be readily identified. For example, this approach 
uncovered the subtype A-like regions in the middle of the putative "subtype G M 
genomes 92NG003.1 and 92NG083.2 (Figs 2J and 2E; Fig. 5). However, the results 
from such analyses relying only on extents of sequence divergence must be treated 
with some caution, because they are susceptible to variation in evolutionary rate in 
different lineages. Once suspicious regions have been identified, phylogenetic 
analyses of windows of sequence around these regions can be used to look for 
discordant branching orders, and to identify the subtypes likely to have been involved 
in the recombination event. The bootstrap value supporting the clustering of the query 
sequence with sequences of the supposed "parental" subtypes can be examined, again 
in moving windows along the genome. Finally, informative site analysis can be used 
to map as precisely as possible the breakpoints of the putative recombination events 
(52,53). 

Clearly, recombination analysis relies on the availability of accurately 
defined non-mosaic reference sequences. Thus, location of the breakpoints in the two 
G/A recombinant viruses identified here must remain tentative because of the lack of 
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such reference sequences for subtype G. The precise positions of breakpoints in the 
recently characterized Thai and CAR "subtype E" viruses are similarly uncertain 
(7, 18), in this case for lack of a complete non-mosaic subtype E reference sequence. It 
should also be emphasized that currently designated reference sequences may require 
revision in the future. For example, the inadvertent inclusion of recombinant 
"reference" sequences in previous tree analyses (19 ? 40) led to an incorrect subtype 
assignment of subtype G and "E" gp41 sequences. As more sequences become 
available, it is thus possible that one or more of the viral sequences currently 
designated as non-recombinant may be identified as a hybrid. 

Example 8 

Identification of the HIV-1 Clone 94CY032.3 
Full length reference clones and sequences are currently available for 
eight HIV-1 group M subtypes (A - H), but none have been reported for subtypes I 
and J, which have only been identified in a handful of individuals. Phylogenetic 
information for subtype I, in particular, is limited since only a very small env gene 
fragment (400 bp in the C2-V3 region) obtained from only two individuals (a 
heterosexual couple of intravenous drug users from Cyprus) has been analyzed. To 
characterize subtype I in greater detail, long range PCR was employed to clone a full 
length provirus (94CY032.3) from a short-term cultured isolate (94CY032) 
established from one of the two individuals originally reported to be infected with this 
subtype. 

Using primers homologous to the tRNA primer binding site (5- 

TCTCT-acgcgtGGCGCCCGAACAGGGAC-3' (SEQ ID NO: ), lower case 

letters indicate an Mlul site) and the polyadenylation signal in the 3' LTR (5- 

ACCAGacgcgtACAACAGACGGG-CACACACTACTT-3') (SEQ ID NO: ), 

long range PCR was used to amplify near full length genomic fragments, which 
contained all coding and regulatory regions except for 102 bp of 5' unique LTR 
sequences (U5) (for methodological details concerning the long range PCR approach 
see refs. 18, 56, 79). Amplification products were subcloned into an a plasmid 
vector, mapped by restriction enzyme digestion, and one clone (94CY032,3) was 
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selected for further analysis. A 694 bp fragment spanning the remainder of the LTR 
was amplified separately using a semi-nested approach (18). 

The complete sequence of 94CY032.3 was determined using the primer 
walking approach [GenBank accession numbers: AF049337 (genome) and AF049338 
(LTR)]. Examination of potential coding regions revealed the expected reading 
frames for gag, pol, vif f vpr, tat, rev, vpu, env and nef (Fig. 13). None of the genes 
contained major deletions, insertions or rearrangements. However, both env and vif 
genes contained single in-frame stop codons (Fig. 13). There was also a frameshift at 
position 5199 (single base pair insertion) which altered the C-terminus (last six amino 
acid residues) of the Vpr protein. All other protein domains of known function as 
well as major regulatory sequences, including the primer binding site, the packaging 
signal and major splice sites, appeared to be intact. Similarly, the number, position 
and consensus sequences of promoter and enhancer elements in the 94CY032.3 LTR 
were indistinguishable from those of most other HIV-1 strains, except for the 
presence of an unusual TATA sequence (TAAAA), thus far only found in "subtype 
E M (A/E) viruses from Thailand and the Central African Republic (7, 18). 

To compare 94CY032.3 to previously reported subtype I sequences, a 
phylogenetic tree was constructed from C2-V3 sequences, including representatives 
of all 10 known group M subtypes (data not shown). As expected, 94CY032.3 
clustered most closely with CYH0321 and CYH0322, sequences amplified from 
uncultured PBMC DNA of the same individual (H032) from whom the 94CY032 
isolate was derived. 94CY032.3 also clustered very closely with CYH03 1 1, a 
sequence derived from the sexual partner of H032 (29), strongly suggesting that the 
two infections were epidemiologically linked. Finally, as observed in the past (29), 
all subtype I sequences clustered independently, forming a distinct lineage roughly 
equidistant from all other subtypes, including subtype J (30). These findings thus 
confirmed the authenticity of the 94CY032.3 clone and validated it as a representative 
of subtype I in the C2-V3 region of the viral envelope. 

To characterize the remainder of the 94CY032.3 genome, pairwise 
sequence comparisons were then performed with recently reported non-mosaic 
reference sequences for subtypes A-H (32, 79) as well as selected intersubtype 
recombinants (83). This approach has been useful for identifying regions of unusual 
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sequence similarity (or dissimilarity) as an indicator of recombination (18, 79). 
Briefly, 94CY032.3 was added (using the profile alignment option of CLUSTAL W; 
27) to a multiple genome alignment which included a total of 28 sequences from the 
database (81) representing subtypes A (U455, 92UG037.1), B (LAI, RF, OYI, MN 
and SF2), C (C2220, 92BR025.8), D (NDK, Z2Z3, ELI, 84ZR085.1, 94UG114.1), F 
(93BR020.1), and H (90CF056.1) as well as A/C (ZAM184, 92RW009.6), A/G 
(92NG083.2, 92NG003.1, Z321, IBNG), AID (MAL), and A/E (93TH253.3, CM240, 
90CF402.1) and B/F (93BR029.4) recombinants (SIVcpzGAB was included as an 
outgroup). All sites with a gap in any of the sequences were removed from the 
alignment to ensure that all comparisons were made across the same sites. The 
percent nucleotide sequence diversity between 94CY032.3 and selected other viruses 
was then calculated for sequence pairs by moving a window of 400 bp in steps of 10 
bp along the genome. 

Fig. 9 depicts five such distance plots which illustrate the extent of 
sequence divergence of 94CY032.3 from representatives of subtypes A (92UG037.1), 
B (LAI) ? C (C2220), D (ELI) and G/(A) (92NG083.2). The analysis yielded a set of 
distance curves with very similar (and for the most part superimposable) diversity 
profiles, suggesting that 94CY032.3 was roughly equidistant from the other subtypes 
in most regions of its genome (the same results were also obtained when 94CY032.3 
was compared to representatives of subtypes A/E, F, and H; data not shown). 
However, careful inspection of the graphs revealed several small areas of 
disproportionate sequence similarity involving two of the five reference sequences. 
For example, at the 3' end of gag and the 3' end of pol, 92NG083.3 dropped below all 
others, indicating a relative greater similarity of 94CY032.3 to subtype G. Similarly, 
in the 5' end of gag, vif, and the 3' and 5* end of env, 92UG037. 1 fell below all others, 
indicating a relative greater similarity of 94CY032.3 to subtype A. Together, these 
results suggested that 94CY032.3 contained subtype A and G-like segments, in 
addition to regions that appeared to be equidistant from the other subtypes. 

Relative differences in the extent of sequence similarity as determined 
by diversity plots (18, 79) or other methods of distance measurement (75) are not 
always an indicator of recombination, but can reflect variations in the evolutionary 
rates of the lineages compared. To determine whether 94CY032.3 was truly mosaic, 



427568 1 



-53- 

an exploratory tree analysis was then performed to look for significantly discordant 
phylogenetic positions for different parts of its genome (Fig. 10). Using the same 
multiple genome alignment described above, but excluding all known recombinants 
(except 92NG083.3 and 92NG003. 1), unrooted trees were constructed for 
overlapping fragments of 400 bp, moved in 10 bp increments along the alignment (for 
subtypes B and D only three representatives were included). Inspection of the 
resulting topologies revealed that 94CY032.3 changed its phylogenetic position a 
total often times, alternating between subtype A (Fig. 10A, E, G and J; panels 201- 
600, 4241-4640, 5071-5470 and 6821-7220), subtype G (Fig. 10B, D and H; panels 
1101-1500, 3841-4240 and 5471-5870), and an independent position (Fig. 10C, E, I 
and K; panels 1751-2150, 4641-5040, 5901-6300 and 7901-8300) that was very 
similar to the one observed in the C2-V3 region (all discordant positions were 
supported by significant bootstrap values). Since the latter has served as the basis for 
subtype I definition, it is most parsimonious to assume that all independently 
grouping segments in 94CY032.3 are of a common origin and thus represent "subtype 
j" 94CY032.3 thus appears to be comprised of sequences belonging to at least three 
different (group M) subtypes. 

To map the boundaries of the putative A, G and I segments, boostrap 
plot analyses were performed as previously described (18, 57, 79), plotting the 
magnitude of the bootstrap values that supported the clustering of 94CY032 .3 with 
92UG037.1 (subtype A), as well as that of 94CY032.3 with 92NG083.2 ("subtype 
G"). The results of these analyses allowed us to tentatively map the location and 
boundaries of the various subtype A an G segments along the 94CY032.3 genome 
(Fig. 11). Bearing in mind the window size of 400 nucleotides and considering only 
peaks of significant bootstrap values (>80%), we identified two A/G cross-overs 
around 1200 and 5600, and one G/A cross-over around 4100. The bootstrap plots 
also outlined regions with no peaks (or peaks below 80%), which coincided with 
segments that clustered independently (i.e., in subtype I) in the exploratory tree 
analysis. Delineating the boundaries of these regions suggested five additional 
breakpoints: G/l at 1500, 1/G at 3800, G/I at 6000, I/A at 6900, and A/I at 7200. 
Because full length non-mosaic reference sequences for the parental lineages (G and 
I) were not available, most of the breakpoints could not be mapped with certainty (the 
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A/G breakpoints at 1200 and 5600 were confirmed by informative site analysis; data 
not shown). Also, the recombinant nature of 92NG083 .2 prohibited reliable 
breakpoint analysis between 4200 and 4800 (32, 79; highlighted in Fig. 1 1). 

To map potential recombination breakpoints in this remaining region, 
four recently reported, partial but non-mosaic subtype G sequences from Mali which 
spanned the vif/vpr region and thus bridged the "subtype A gap" of 92NG083.2 were 
used (77). A set of distance plots that compare 94CY032.3 to one of these newly 
derived G sequences (95ML045) as well as representatives of subtype A (U455), B 
(MN), and D (ELI), respectively, were constructed (data not shown). Consistent with 
the results from the exploratory tree analysis (Fig. 4), 94CY032.3 was 
disproportionately more closely related to U455 in the 5* and 3' thirds of this 
fragment, suggesting the presence of subtype A-like segments. However, in the 
middle of the fragment, 94CY032.3 was clearly equidistant from U455 and the other 
subtypes, suggesting an independent position (diversity plots were generated for a 
window of 300 bp moved in increments of 10 bp). Thus, noting the points at which 
the "A" distance increased and decreased relative to the other distances allowed us to 
tentatively map the two remaining breakpoints, one at 4650 and the other at 5000. 
Trees constructed from sequences surrounding these two breakpoints (Fig. 12) 
confirmed that 94CY032.3 switched position from subtype A (Fig. 12; panel 4255- 
4650) to subtype I (panel 4651-5000), and back to subtype A (5001-5300; note, that 
the new subtype G sequences only cover the region between 4255 and 5300). 

There are a total of 10 recombination breakpoints between the 5' end of 
gag and the 3' end of nef in the genome structure of the 94CY032.3. However, the 
discordant subtype assignments of gag and nef regions necessitate at least one more 
breakpoint in the viral LTR or the gag leader sequence (LTR sequences were not 
separately analyzed for mosaicism). Given this extent of mosaic complexity, 
94CY032.3 is likely the result multiple sucessive recombination events. 

Having identified several fragments of subtype I in 94CY032.3, 
evidence for its presence in other (full length) recombinants from the database was 
examined. (Data not shown) Two known mosaics MAL (53, 76) and Z321 (78) were 
of particular interest, because previous analyses had indicated that these viruses 
contain regions of uncertain subtype assignment (53, 82, 83). For example, MAL has 
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long been known to represent a mosaic of subtypes A and D, but also contains a 
sizable pol fragment that has defied previous subtype classification (53, 83). 
Similarly, Z321 is a known mosaic of subtypes A and G (78), but a recent re-analysis 
of its recombination breakpoints identified regions that could not be assigned to any 
known subtype (82, 83). To determine whether any of these regions represented 
subtype I, distance plot analysis was performed, comparing the diversity profiles of 
MAL and Z321 with those for representatives of other subtypes. Looking for dips in 
the curves as an indication of relatively greater sequence similarity, one in the pol 
region of MAL and another in the vif/vpr region of Z321 were found to coincide with 
previously unclassified segments of their genomes (indicated as white boxes). 
Phylogenetic tree analysis confirmed that these regions were indeed of subtype I 
origin, since MAL and Z321 clustered significantly with the subtype I domains of 
94CY032.3. Interestingly, subtype I did not account for all of the unclassifiable 
regions in MAL and Z321 (82, 83). It thus remains unclear whether these represent 
still other, as yet unidentified, subtypes or regions of multiple breakpoints that cannot 
be mapped using current methods. 

The above results demonstrate that a strain of fflV-1, proposed in 1995 
as a prototypic "subtype I" isolate (29), represents a complex mosaic comprised of 
subtypes A, G and I, respectively. In addition, two of the oldest known isolates from 
Africa, MAL (isolated in 1984) (76) and Z321 (isolated in 1976) (80, 84), are shown 
to contain short segments of sequence closely related to the subtype I domains of 
94CY032.3. These findings support the following conclusions: (i) although initially 
detected in Cyprus, subtype I must have existed in Africa as early as 1976; it is 
unknown whether full length non-mosaic representatives of subtype I still exist (but 
have not yet been sampled), or whether this subtype (like subtype E) is represented 
only by fragments in present day recombinants; (ii) the ancestry of 94CY032.3 must 
have involved multiple successive recombination events; it remains unclear whether 
this occurred in Africa and/or in Cyprus, where a number of different subtypes have 
also been documented (29); (iii) subtype I, along with subtypes A and G, must have 
diverged substantially earlier than the 1970s in order to be detectable as distinct 
segments in the Z321 genome; this is consistent with the recent molecular 
characterization of a virus from 1959 which in phylogenetic analyses appears to have 
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postdated the group M radiation (85); (iv) finally, the finding of subtype I in several 
different recombinants, including one from an intravenous drug user (29), suggests 
that this subtype may be more widespread than previously thought, at least in the 
form of mosaic genome fragments. It will be interesting to screen additional viruses 
from drug user populations and their contacts in Cyprus and Greece to determine the 
current prevalence and geographic distribution of subtype I containing viruses. 
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Modifications of the above described invention that are obvious to 
those of skill in the fields of genetic engineering, immunology, virology, protein 
chemistry, medicine, and related fields are intended to be within the scope of the 
following claims. 

All of the references cited herein above are hereby incorporated by 

reference. 
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CLAIMS 

We claim: 

1. A nucleic acid comprising the nucleotide sequence of the genome of a non- 
subtype B HIV-1 virus, wherein said nucleotide sequence is selected from sequences 
shown in Fig. 13. 

2. A nucleic acid comprising a sequence of at least 12 contiguous bases 
derived from the nucleic acid of claim 1 . 

3. A nucleic acid comprising the nucleotide sequence of a LTR derived from 
the nucleic acid of claim 1. 

4. A nucleic acid encoding a polypeptide selected from the group consisting 
of Gag, Pol, Vif, Vpr, Env, Tat, Rev, Nef and Vpu, wherein the polypeptide is encoded 
by the genome of a virus selected from the group consisting of 92RW009.6, 
92NG003.1, 92NG083.2, 93BR020.1, 93BR029.4, 90CF056.1, 94CY032.3, 
94CY017.41, 96ZM651.8, 96ZM751.3, and 94IN476.104. 

5. A nucleic acid according to claim 4 having a nucleotide sequence derived 
from any one of the nucleotide sequences shown in Fig. 13. 

6. A nucleic acid comprising a sequence complementary to the sequence of a 
nucleic acid of any one of claims 1-5. 

7. A vector comprising a nucleic acid of any one of claims 1-5. 

8. A cell comprising the nucleic acid of any of claims 1-5. 

9. A cell comprising the vector of claim 7. 

10. A composition comprising a nucleic acid of any one of claims 1 to 5, and a 
physiologically acceptable carrier. 

1 1. A vector comprising a nucleic acid of claim 6. 

12. A cell comprising the nucleic acid of claim 6. 
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13. A cell comprising the vector of claim 1 1 . 

14. A composition comprising a nucleic acid of claim 6, and a physiologically 
acceptable carrier. 

15. A polypeptide encoded by the nucleic acid of claim 1. 

16. The polypeptide of claim 15 comprising a contiguous sequence of at least 
13 amino acids. 

17. A composition comprising a polypeptide of any one of claims 15 to 16, and 
a physiologically acceptable carrier. 

18. A method for producing a polypeptide of claim 15, said method comprising 
growing the cell of claim 8 under conditions such that the encoded polypeptide is 
produced. 

19. A method for producing a polypeptide of claim 15, said method comprising 
growing the cell of claim 9 under conditions such that the encoded polypeptide is 
produced. 

20. A method for producing a polypeptide of claim 15, said method comprising 
growing the cell of claim 12 under conditions such that the encoded polypeptide is 
produced. 

21. A method for producing a polypeptide of claim 15, said method comprising 
growing the cell of claim 13 under conditions such that the encoded polypeptide is 
produced. 

22. A method of inducing serum antibodies that bind at least one polypeptide 
of claim 15, said method comprising, administering to a mammal, in a physiologically 
acceptable carrier, an amount of polypeptide of any one of claims 15 or 16 sufficient 
to elicit production of said antibodies. 

23. An antibody to a non-subtype B HIV-1 virus made by the method of claim 

22. 
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24. A method of inducing serum antibodies that bind at least one polypeptide 
of claim 15, said method comprising administering to a mammal, in a physiologically 
acceptable carrier, a nucleic acid of any one of claims 1, 2 or 4 which encodes a 
polypeptide and which produces an immunologically sufficient amount of the encoded 
polypeptide to elicit said antibodies. 

25. An antibody to a non-subtype B fflV-1 virus made by the method of claim 

24. 

26. A method for detecting the presence of a non-subtype B HIV-1 virus in a 
sample comprising contacting said sample with an antibody of claim 23 under 
conditions that allow the formation of an antibody-antigen complex and detecting said 
complex. 

27. A method for detecting the presence of a non-subtype B HIV-1 virus in a 
sample comprising contacting said sample with an antibody of claim 25 under 
conditions that allow the formation of an antibody-antigen complex and detecting said 
complex. 

28. A method for detecting the presence of antibodies to a non-subtype B HIV- 
1 virus in a sample comprising contacting said sample with a polypeptide according to 
any one of claims 15 or 16 under conditions that allow the formation of an antibody- 
antigen complex and detecting the complex. 

29. A kit for detecting the presence of a non-subtype B HIV-1 virus in a 
sample comprising an antibody of claim 23. 

30. A kit for detecting the presence of a non-subtype B HIV-1 virus in a 
sample comprising an antibody of claim 25. 

31. A method for detecting the presence of a non-subtype B HIV-1 virus in a 
sample comprising contacting said sample with a nucleic acid of any one of claims 1 
to 5 and detecting said nucleic acid bound to the genomic DNA, mRNA or cDNA of 
the non-subtype B HIV-1 virus . 
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32. A method for detecting the presence of a non-subtype B HIV-1 virus in a 
sample comprising contacting said sample with a nucleic acid of claim 6 and detecting 
said nucleic acid bound to genomic DNA, mRNA or cDNA of the non-subtype B 
HIV-1 virus. 

33. A kit for detecting the presence of a non-subtype B HIV-1 virus in a 
sample comprising a nucleic acid of any one of claims 1 to 5. 

34. A kit for detecting the presence of a non-subtype B HIV-1 virus in a 
sample comprising a nucleic acid of claim 6. 

35. A composition comprising an antibody according to claim 23 or 25, and a 
physiologically acceptable carrier. 

36. A nucleic acid probe comprising a sequence of at least 19 contiguous 
nucleotides derived from the nucleic acid of claim 1, or the complementary sequence 
thereof 

37. A method of detecting the presence of a non-subtype B HIV-1 virus in a 
biological sample comprising: 

(a) contacting the nucleic acid of the biological sample with a nucleic acid 
probe of claim 36; and 

(b) detecting the presence or absence of complexes formed between said 
nucleic acid of the biological sample and said nucleic acid probe. 

38. A method of detecting the presence of a non-subtype B HIV-1 virus in a 
biological sample comprising: 

(a) contacting said biological sample with at least two nucleic acid probes of 
claim 36; 

(b) amplifying the RNA of the biological sample via reverse transcription- 
polymerase chain reaction to produce amplification products; 

(c) detecting the presence or absence of amplification products. 
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39. A composition comprising a nucleic acid probe according to claim 36. 

40. A method for analyzing a first nucleotide sequence comprising comparing 
the nucleotide sequence of any one of the nucleotide sequences set forth in Fig. 13 
with said first sequence. 

41. A method for analyzing a first amino acid sequence comprising comparing 
the amino acid sequence of any one of the amino acid sequences set forth in Figs. 14- 
22 with said first sequence. 
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93BR020 . 1 CCACAAGATTTAAACACCATGTTAAATACAGTGGGGGGACATCAACCAGCCATGCAAATG 720 

92NG083.2 T C G T 755 

90CF056.1 C TG-T C G 736 

92RW009.6 C 733 

92NG003.1 G- -T C C--C G T 756 

93BR029.4 C C T 741 

94CY032.3 G TG C T C--G A 755 

96ZM651.8 733 

96ZM751.3 C 727 

94CY017.41 C T--T C C T 746 

94IN476 . 104 --CTCT 728 

9 3 BRO 2 0 . 1 TTAAAAGACACCATCAATGAGGAGGCTGCAGAATGGGACAGATTACATCCAACACAGGCA 7 8 0 

92NG0 83 .2 C G- -T- -T T- -A G GA CAG 815 

90CF056.1 T--A A GG GTG- -T 796 

92RW009.6 C T--A G T- - GG GTG 793 

92NG003.1 C TT-T--T A--A G T--GC CA 816 

93BR029.4 A A G GTG- -T 801 

94CY032.3 T A C GAC GT---T--- 815 

96ZM651.8 T--T T GTG - - T 793 

96ZM751.3 T A T--G GT---T--- 787 

94CY017.41 T A GG QT---T--- 806 

94IN476.104 T C G T QT---T--- 788 

93BR020 . 1 GGACCCATCCCCCCAGGTCAGATAAGGGAACCTAGGGGAAGTGATATAGCTGGAACTACT 840 

92NG083 .2 --G--T--T--A C--A A--G T A 8 75 

90CF056.1 --G--T--T--A C--A--G--A A C A 856 

92RW009 .6 - -G- -TG-TG-G C A A C A 853 

92NG003.1 T--T--A C A A A 876 

93BR029.4 T A C G 861 

94CY032.3 --G--T--T--A C G- -A A A 875 

96ZM6 51 . 8 --G--T--TG-A C--A--G--A A A 853 

96ZM751 .3 --G--T--TG-A C--A A A C A 847 

94CY017.41 --G--T--T--A C G- -A A C A 866 

94IN4 76 .104 - -G--T-AT- -A C G--A A A 84 8 

93BR020 . 1 AGTACCCTTCAGGAACAAATACAATGGATGACAGGCAACCCACCTGTCCCAGTGGGAGAA 900 

92NG083.2 G AG CA A 935 

90CF056.1 G GC T---G--A C 916 

92RW009.6 GC AAT A-T 913 

92NG003.1 G AC CA A 936 

93BR029.4 A 921 

94CY032.3 A GG A 935 

96ZM651.8 C--A G---GC A-T- -T- -C- - -A-T C 913 

96ZM751.3 G GC AAT A-T C 907 

94CY017.41 GGT CA--G-T CA 926 

94IN476.104 GC T A-T C 908 

93BR020 . 1 ATGTATAAAAGATGGATCATCCTAGGATTAAATAAAATAGTAAGAATGTATAGC C CTGTC 960 

92NG083.2 --C A G G 995 

90CF056.1 --C A G G T 976 

92RW009.6 --T A--T--G--G 973 

92NG003.1 --T A __ T __ G --G G 996 

93BR029.4 --T AC " 981 

94CY032.3 --C A---T-G--G C CA-T 995 

96ZM651.8 --C A--T--G--G 973 

96ZM751.3 --C A--T--G--G C 967 

94CY017.41 A G 986 

94IN476.104 --C A--T--G--G 968 

9 3 BRO 2 0 . 1 GGCATTTTGGACATAAGACAAGGGCCAAAAGAACCCTTTAGAGACTATGTAGACAGGTTC 102 0 

92NG083.2 A T T 1055 

90CF056.1 A C A T 1036 

92RW009.6 A A A G T C 3 033 

92NG003.1 A-T A A C T G--T 1056 

93BR029.4 A C---G A G TC-A--T 1041 

94CY032.3 A C A C T T 1055 

96ZM651.8 A A G C 1033 

96ZM751.3 A C A A G G TC 1027 

94CY017.41 A T G- -T 1046 

94IN476.104 A G C 1028 
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9 3BR0 2 0.1 TTTAAAACCCTAAGAGCTGAGCAAGCTACACAGGAAGTAAAGGGTTGGATGACAGACACC 1080 

92NG083.2 TT-G A 1115 

90CF056.1 TT C T--G AA A 1096 

92RW009.6 T C--A T A--T AAA T 1093 

92NG003.1 TT-G C G AAAC 1116 

93BR029.4 -A T A A--T T AAA A 1101 

94CY03 2 .3 TGT--C A- -A C G--G--AAA A 1115 

962M651.8 --C TT A--G A AAA 1093 

96ZM751 . 3 TT A A--T AA . 1086 

94CY017.41 T C G AAAC G 1106 

94IN476.104 TT A A A 1088 

93BR020 . 1 TTGTTGGTCCAAAATGCGAACCCAGATTGTAAGACCATTTTAAAAGCATTGGGACCAGGG 1140 

92NG083.2 T A C G A A 1175 

90CF056.1 A--T C T--A G A A 1156 

92RW009.6 A G A--G 1153 

92NG003.1 CC G A G A 1176 

93BR029.4 C- -A CA 1161 

94CY032.3 C--C T C T C A A 1175 

96ZM651.8 A C A 1153 

96ZM751.3 T GG A 1146 

94CY017.41 C GAT C--G-G A 1166 

94IN476.104 A G A 1148 

9 3 BRO 2 0 . 1 GCTACACTAGAGGAAATGATGACAGCATGTCAGGGAGTGGGAGGACCTAGCCATAAGGCA 120 0 

92NG083.2 A C A 1235 

90CF056.1 T--A A T A 1216 

92RW009.6 T--T A C CG A 1213 

92NG003.1 A T A C C--A 1236 

93BR029.4 A G CG A 1221 

94CY032.3 T A C A 1235 

96ZM651.8 T A A C--A 1213 

96ZM751.3 T A G G C--A 1206 

94CY017.41 --CT--T A C A 1226 

94IN476.104 T--T A G A C--A 1208 

93BR020.1 AGAGTTTTGGCTGAGGCAATGAGCCAAGCAACAAAT ACAGCT . . . ATAATGATG 1251 

92NG0 83 .2 A G T - - GG - GCAGCAG AGCC 1295 

90CF056.1 T . . . ACA- ATA -AGCC 1273 

92RW009.6 --G A T-CA- CAAC--AAC. . . 1264 

92NG003 .1 A GG GG- . . . ACAT AGCC 1293 

93BR029.4 A- -A T . . . TCAGGTA- C . . . 1275 

94CY032 .3 A G T . . . GCAG AGCC 1292 

96ZM651.8 G A AT-G- GTA-A- . . . . C 1264 

96ZM751 . 3 A T- - AC ACA-A- .... 1257 

94CY017.41 --G A T-T-CA- -G- . . . ACA-ATA-AAAC 1283 

94IN476.104 G T--CAT-G- -A-.... 1256 

9 3 BRO 2 0 . 1 CAGAAAAGTAACTTTAAGGGCCAAAGAAGAATTGTTAAATGCTTTAATTGTGGCAAAGAA 1311 

92NG083.2 C--T CG A G--T--C--C G 1355 

90CF056.1 G-C A-T C--C G 1333 

92RW0 0 9 . 6 G-G-C--T G A G--T--C--C 13 24 

92NG003 .1 AC- -T CG GG-A G--T--C--C G 1353 

93BR029 .4 G-G-C- -T G-AA AG-C-A G--T- -C 1335 

94CY032 .3 C- -A C-A G--T--C--C G 1352 

96ZM651 . 8 C--T A--AA-T-A G T C T--G 1324 

96ZM751 .3 C- -T A CT-A T- -C--C GG 1317 

94CY017.41 G-G-C--T G T A ...A G--T--C--C G 1340 

94IN4 76 .104 G-G-C- -T A CT-A C--C G 1316 

9 3 BRO 2 0 . 1 GGAC AC AT AG C C AAAAAT TG C AGGG C C C C T AGAAAAAAGGG C T GT TGG AAGTGT GG AAGA 13 71 
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90CF056.1 G G A 1393 
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94IN476.104 --G G GA A GCA- 1376 
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9 3 BR 0 2 0 . 1 ATGATTTGTATGTAGGGTCTGACTTAGAAATAGGAC AGC ATAGAACAAAAATAGAAGAGT 24 90 

92NG083.2 A A G G G 2531 

90CF056.1 A G--A G G 2509 

92RW009.6 C C A G--A G G 2494 

92NG003.1 A A C---G G G- -A- 2529 

93BR029.4 A G T--G G--A- 2514 

94CY032.3 G--A G C 2528 

96ZM651.8 CC A G--A G 2491 

96ZM751.3 C A T G--A--C---G G 2502 

94CY017.41 C A T A-C GT G--A- 2519 

94IN476.104 C G--T G G 2483 

93BR020 1 TAAGAGAACATCTACTGAAATGGGGATTAACTACACCAGACAAAAAACATCAAAAAGAAC 2550 

92NG083.2 G--C T G 2591 
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92RW009.6 T-A--G T--C G G 2554 
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96ZM651.8 T-A--G T--C G G 2551 

96ZM751.3 C T G T G--G G G- 2562 

94CY017 .41 G-CT--CT--T TTA G 2579 

94IN476.104 C T-A--G C--C T--G G 2543 

93BR020 . 1 CCCCATTCCTTTGGATGGGGTATGAACTCCATCCTGATAAATGGACAGTGCAGCCTATAC 2610 

92NG083.2 -T A G C G--A--A 2651 

90CF056.1 T A C A---A--G--A 2629 

92RW009.6 -T T T C A--A 2614 

92NG003 .1 -T T--C A G C G--A--A 2649 

93BR029.4 -T T A G 2634 

94CY032.3 T C 2648 

96ZM651.8 T C A 2611 

96ZM751.3 T C A A 2622 

94CY017.41 -T T A G--T C C A 2639 

94IN476.104 T C A A 2603 

9 3 BRO 2 0 . 1 AATTGCCAGACAAGGACAGCTGGACTGTCAATGATATACAGAAGTTAGTAGGAAAACTAA 267 0 

92NG083.2 -GC A AGAT A G 2711 

90CF056-1 --C A--A G 2689 

92RW009.6 -GC A T G T 2674 

92NG003.1 -GC A A- -A A- -A G 2709 

93BR029.4 TGC A- -A C G T-G- 2694 

94CY032.3 --CC-G A T C C G 2708 

96ZM651 . 8 -GC--G A--A--T T G T 2671 

96ZM751.3 -GC A G G T 2682 

94CY017.41 -GC A- -A A G ---T--- 2699 

94IN476.104 -GC A T G T 2663 

93BR020 . 1 ATTGGGCAAGTCAGATTTATCCAGGGATTAAAGTAAAACAATTATGTAAACTCCTTAGGG 2730 

92NG083.2 G- - CC G 2771 

90CF056.1 AAT G C 2749 

92RW009.6 -C C G GG G 2734 

92NG003.1 G C 2769 

93BR029.4 G GG 2754 

94CY032.3 G T 2768 

96ZM651.8 -C CG GG---C-T 2731 

96ZM751.3 -C---. CG GG---C-G 2741 

94CY017.41 G G C-G A- 2759 

94IN476.104 -C C G-GG---C-T 2723 

93BR0 2 0.1 GAGCCAAGGCACTAACAGACATAGTGCCACTGACTACAGAAGCAGAGTTAGAATTGGCAG 2790 

92NG083 .2 -G A A--C--A--GG AA-G--GC 2831 

90CF056 . 1 -G A T A-A A-A G A--G 2809 

92RW009 .6 --A A---T A A---GA A 2794 

92NG003.1 -G A A GA A 2829 

93BR029 .4 --A A AG A A- -AG G C C 2814 

94CY032.3 T--A--C A A G A 2828 

96ZM651.8 A A A GA A 2791 

96ZM751.3 A A T GA G A 2801 

94CY017.41 A AA A A-A- - 2819 

94IN476.104 -G A A A GA A A 2783 
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94IN476 . 

93BR020 . 
92NG083 . 
90CF056 . 
92RW009 . 
92NG003 . 
93BR029 . 
94CY032 . 
96ZM651 . 
96ZM751 . 
94CY017 . 
94IN476 . 

93BR020 . 
92NG083 . 
90CF056 . 
92RW009 . 
92MG003 . 
93BR029 . 
94CY032 , 
96ZM651 . 
96ZM751 . 
94CY017 , 
94IN476 . 

93BR020 
92NG083 
90CF056 
92RW009 
92NG0 03 
93BR029 
94CY032 
96ZM651 
96ZM751 
94CY017 
94IN476 



1 
2 
1 
6 
1 
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3 
8 
3 

41 
104 
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2 
1 
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4 
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8 
3 

41 
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GTGAAGGGGCAGTAGTCATACAAGACAATAGTGAAATAAAGGTAGTTCCAAGAAGAAAAG 














































































































CAAAGATCATTAGGGATT 


-^VIF start 

AT GGAAAAC AG AT GGC AGGT GATG ATT GTGT G G C AGGT AG AC 


































POL end 

AG GAT GAG GATT Aft 


CACA 


TGGAAAAGTTTAGTAAAATACCATATGCATATTTCAAAGAAA 















































































































POL end 

GCCAAAGGATGGTTTTATAGACATCACTTTGAAAGCAGGCATCCAAAAATAAGTTCAGAA 



-T-G-- 
-T--G- 



-CT- 



-G-G- 
-GG- 



-G-AT- 
- -A 



-T 

8 --T--T-- 
3 A-TGGTA- 

41 --T 

104 --T-GT-- 



G- 

-G-G- 
---G- 
-GGG- 



-AA- 



-G- 



93BR020 . 1 
92NG083 . 2 
90CF056 . 1 
92RW009.6 
92NG003 .1 
93BR029 .4 
94CY032 .3 
96ZM651 . 8 
96ZM751 . 3 
94CY017 .41 
94IN476 .104 



GT AC AC AT C C C AC T AGAGAC AGCTGAAT T AGT AAT AAC AAC AT A C T GGGGG C TGC TT C C A 

AGAGAT AC-C G G T T A-G- - 

T GAGA AGG C C T AA-A-- 

T G-GA AG A T TT AAA- - 

G-GAG AG G---G T T A-A-- 

T GA A T A-A- - 



x_ 

T- 



--T- 
--T- 
-G-- 
--T- 



-G-GAG- 
-G-GAT- 
-G-GAT- 
-G-GAG- 
-GAGAT- 



--AG--- 

- -A 

-CA 

--AG-A- 

- -AG 



-T- 

-T- 
-T- 
-T- 
-T- 



-T AG 

-TT AAA- - 

A 

-T ACAT- 

-TT-A-AAA- - 



4290 
4331 
4309 
4294 
4329 
4314 
4328 
4291 
4301 
4319 
4282 

4350 
4391 
4369 
4354 
4389 
4374 
4388 
4351 
4361 
4379 
4342 

4410 
4451 
4429 
4414 
4449 
4434 
4448 
4411 
4421 
4439 
4402 

4470 
4511 
4489 
4474 
4509 
4494 
4508 
4471 
4481 
4499 
4462 

4530 
4571 
4549 
4534 
4569 
4554 
4568 
4531 
4541 
4559 
4522 

4590 
4631 
4609 
4594 
4629 
4614 
4628 
4591 
4601 
4619 
4582 



Fig. 13-13 



39/65 



93BR020 . 1 
92NG083 .2 
90CF056.1 
92RW0 0 9.6 
92NG003 . 1 
93BR029 . 4 
94CY032 .3 
96ZM651 . 8 
96ZM751 .3 
94CY017.4 
94IN476 . 1 

93BR020 .1 
92NG083 .2 
90CF056 . 1 
92RW009 .6 
92NG003 .1 
93BR029 .4 
94CY032.3 
96ZM651 .8 
96ZM751 .3 
94CY017 .4 
94IN476 .1 

93BR020 . 1 
92NG083 .2 
90CF056 .1 
92RW009 .6 
92NG003 .1 
93BR029 .4 
94CY032 .3 
96ZM651.8 
96ZM751 .3 
94CY017 .4 
94IN476 .1 



GGAGAAAGAGAATGGCATCTGGGTCAGGGAGTCTCCATAGAATGGAGGCAGGGGAGGTAT 

A c AT C- - T- -G- -T AAA- -A 

X-A--C A-T-AAA 

_ _ G T X t ATT-A-A- -A 

G A A-A--A 



-C- 

-T- 



-CT- 



-CA- 



1 

04 



TCA-A--A 

-ATT-A-A- -A 

-ATT-A-A- -A- -C 

AAAC 

-ATT A- -A 



AGAACACAAATAGACCCTGGCCTGGCAGACCAACTGATCCATATATATTATTTTGATTGT 
__ T AA-ACA T T C-G C 



1 

04 



1 

04 



















































































































TTTTCAGAATCTGCCATAAGGAAAGCCATATTAGGACATAAAATTAGCCCTAGGTGTAAC 




























-AGT-G A 


G-A 




















































































G-T 



93BR020 . 
92NG083 . 
90CF056 . 
92RW009 . 
92NG003 . 
93BR029 . 
94CY032 . 
96ZM651 . 
96ZM751 . 
94CY017 . 
94IN476 . 

93BR020 . 
92NG083 - 
90CF056 . 
92RW009 . 
92NG003 . 
93BR029 . 
94CY032 . 
96ZM651 . 
96ZM751 . 
94CY017. 
94IN476 . 

93BR020 . 
92NG083 . 
90CF056 . 
92RW009 . 
92NG003 . 
93BR029 . 
94CY032 . 
96ZM651 . 
96ZM751 . 
94CY017 . 
94IN476 . 



1 TAT C AAG C AGGAC AT AAC AAGGT AGGAT CT CT AC AAT AC TT GG C ACT AAC AG C AT T AAT A 

2 --C-C T TC TCG-A G- - 

! c AC A 



41 

104 

1 
2 
1 
6 
1 
4 
3 
8 
3 

41 



-T-- 

-TC- 



-A- 



-G--- 
-C-A- 



-T- 

-T- 
-T- 



-CT- 



--G-- 

-G 

-G 

-G-A- 



-G- 
-G- 



-G--G- 



4650 
4691 
4669 
4654 
4689 
4674 
4688 
4651 
4661 
4679 
4642 

4710 
4751 
4729 
4714 
4749 
4734 
4748 
4711 
4721 
4739 
4702 

4770 
4811 
4789 
4774 
4809 
4794 
4808 
4771 
4781 
4799 
4762 



4830 
4871 
-G-G 4849 
G 4834 



4869 
4854 
4868 
4831 
4841 
4859 
4822 



T G G 

r—^VPR start 

GCTCCAAAAAAGACAAAGCCGCCTTTGCCTAGTGTCCAGAAACTAGTAGAAGACAG ^TGG 4 8 90 

A-A C--G--A--G---A A TGG T---C T 

_ -a T A TAGA- -G G- -T 

AAA T A- - -C TAGT T 



A-A- 
AAA- 
T-C- 
AAA- 
AAA- 



-C-C- 



-GT 



-A- 
-A- 
-A- 



-TA-- 
-TA-- 
-TAG- 
-TAG- 
-TAG- 



•-AC- 
-GAC- 



-G--T-A 

X- 

-G--T- 
-G--T- 



-T- 

-T- 



104 AAA- 



-G- 
-G- 
-G- 
-G- 



-T- 
-T- 
-T- 
-T- 



VIF end. 



1 
2 
1 
6 
1 
4 
3 
8 
3 

41 
104 



AACAAGCCCCAGAAGACCAGGGGCCACAGAGAGAGCCATACAATGAATGGACACTAGATC 
A A--C T- 



-G- 
-G- 
-G- 



-A-T- 



-TGT- 



A- 
-G- 
-G- 
-A- 
-A- 
-G- 
-GA 
-G- 
-A- 
-G- 



4931 
4909 
4894 
4929 
4914 
4928 
4891 
4901 
4919 
4882 

4950 
4991 
4969 
4954 
4989 
4974 
4988 
4951 
4961 
4979 
4942 



Fig. 13-14 
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93BR020 
92NG0 83 
90CF056 
92RW009 
92NG003 
93BR029 
94CY032 
96ZM651 . 8 
96ZM751 . 3 
94CY017.41 
94IN476 . 104 

93BR020 . 1 
92NG083 .2 
90CF056 . 1 
92RW0 09 .6 
92NG003 .1 
93BR029 .4 
94CY032 . 3 
96ZM651 . 8 
96ZM751 .3 
94CY017 .41 
94IN476 . 104 



93BR020 . 
92NG083 - 
90CF056 . 
92RW009 . 
92NG003 . 
93BR029 - 
94CY032 . 
96ZM651 . 
96ZM751 - 
94CY017 . 
94IN476 . 

93BR020 . 
92NG083 . 
90CF056 . 
92RW009 . 
92NG003 , 
93BR029 . 
94CY032 
96ZM651 
96ZM751 
94CY017 
94IN476 

93BR020 
92NG083 
90CF056 
92RW009 
92NG003 
93BR029 
94CY032 
96ZM651 
96ZM751 
94CY017 
94IN476 . 

93BR020 . 
92NG083 . 
90CF056 . 
92RW009 . 
92NG0 0 3 . 
93BR029 . 
94CY032 . 
96ZM651. 
96ZM751 . 
94CY017 . 
94IN476 



TTTTAGAGGAGCTTAAGAATGAAGCTGTTAGACATTTTCCTAGGCCATGGCTCCATAGCT 

-G A A C G 

A G G AGT CAA- 

CA--C C-G C C A GA-- 

_ G A- -A AC T---G-A- 

A G TT 

G A G A--C G--C 

__ C A--C---C-G C C A 

C C-G C C AA A-- 

-G C---C-G C AC A---G--C 

_ _ c A--C C-G C C A--T-A T 

TAGGACAACATATCTATAACACCTATGGGGATACTTGGGAAGGAGTTGAAGCAATCATAA 

GT T C--A 

G A T TT-A 

T G-A AGG A T- -A 

T T T--A 

G-A--T C G C--A 

G T A C G T--A 

G-A--T ACT C--G--T--A 

CC-A C ACG--G TC-A 

T _ c T c TT 

T G-A- -A G AC C TT-A 

GGATATTGCAACAACTACTGTTTATCCATTTCAGAATTGGGTGCCGTCATAGCAGAATAG 

-A---C-A C AA 

-A-CGC T AA 



-TC- 



1 

2 
1 
6 
1 
4 
3 
8 
3 

41 

104 -A-C-C- 



-A 

-A--TC- 



-T-AA- 



-T-AA- 



-A- 



-AA- 



-AG- 



-A C- 

- ATATC - 



-AA- 



-AG- 



GCAT 



TAC 



3 

41 --- 
104 --- 



r^TAT 1st exon start VPR end^ 

TACTCGACAGAGAAGAG . TAAGAA ATGGAACTAGTAGATCCTAA^TTAGATCC 



-TTG-A 

--T--C-GG-- 



■- .GC- 



C A- AG- - 

GGT 



-GC-G-GG 
- . CG- - 



--TG 

--T AGA- 

-TTA-A--G-- 



-C-C- 
-G-C- 



AC- 



AC- 
C- 



-G-C- 
-G-C- 
-G-C- 
-G-C- 



-G- 
--G 



-G- 
-G- 



GAC- - 



-G- 
-G- 
-G- 



G AC 

AC 

C 



-G- 

-G- 
-G- 



. 1 
.2 
. 1 
.6 
. 1 
.4 
.3 



3 

41 
104 



41 
104 



CTGGAACCATCCAGGAAGCCAGCCTACAACTCCTTGTACCAGATGTTATTGTAAATGGTG 

T G--G--T C A- - A GT 

AA 



-CA- 



-G 

-G-C- 



-A--AT- 
-AT-AC- 



-T-G CAG--GG-- 

_ T GA- 



-AG- 



-TC- 



-CAC-- 
- ATA- - 

-AA 

-AA 



-AT-AG- 



-C-C- 



-T- 

-T- 



-A- 

-A- 



-AT-AG- 
A-- 



-AT-C- 



- - CAC - - 
--C-C-- 
- - CAC - - 
REV 1 st exon j 

TTGCTTTCATTGTTACTGGTGCTTTACAACGAAGGGCTTAGGCATCTCCTjATGGCAGGAA 

C GG C-AGTT TT- -AC- -A- 

C A CC-AAT TT- -A A- 

-A A CTAGTT CCAGG-A- -A- 

c GG CC-A-T CTG-AC 

C-AGTT--T--C A 

GG CC-AGTT CTG-AA- -A- 

-A A CTAGTT CAG--A- - A- 

-A A CTAGTT CAG--A--A- 



A CC-G-T-- 

-A AC CTAGTT - 



-AC- 



-CAG- 



5010 
5051 
5029 
5014 
5036 
5034 
5048 
5011 
5021 
5039 
5002 

5070 
5111 
5089 
5074 
5096 
5094 
5108 
5071 
5081 
5099 
5062 

5130 
5171 
5149 
5134 
5156 
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5168 
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5141 
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5227 
5205 
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5210 
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5197 
5215 
5178 
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5287 
5265 
5250 
5272 
5270 
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5257 
5275 
5238 
at 
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5347 
5325 
5310 
5332 
5330 
5348 
5307 
5317 
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93BR020 . 
92NG083 . 
90CF056. 
92RW009. 
92NG003 . 
93BR029. 
94CY032 . 
96ZM651 . 
96ZM751 . 
94CY017 . 
94IN476 . 

TAT/REV 
93BR020 . 
92NG083 . 
90CF056 . 
92RW009 . 
92NG003 . 
93BR029 . 
94CY032 . 
96ZM651 . 
96ZM751 . 
94CY017 . 
94IN476 . 



GAAGCGGAGACAGCGACCAAGAACTCCTCAAAGCAGTCAGATACATCAAGATTTTGTACC 

cc G-G GG A- -GAT A-CCC 

C AC GC TTTG- - AG AT A C-A- -T- 

G _ _ ACG c G-AGAT A- -CC-A- -T- 

GC G-G G--TCAC GAT A- - CC 

AC c cc 



A T G-G--T T G CA-- 

GC--C CT CG-- 

A G CG c G _. 

--AC CC G--AGC--T C A-A-- 

GC __ CG c G _. 



3 

41 

104 

1 st exon end 

1 AAAGCApTAAGTATTGTTA . 

2 G-AAC-ATT-AT- - 

1 A _ c 

6 G-AA--AAC--T-- 

1 G G-AA--GTT--T-- 

4 AGT-AT 

3 pj^ _ T 

8 

3 CAAAG. . . T-AT-G-- 
41 G-AG- - ATT -AT 

104 -1 -- 



GGC A--C--A 

GAC CC-A- -T- 

GAT A--CC-A--T- 

GAC A- - CC-A 

GAT A--C--A--T- 



, — ^ VPU start 

. AGCATATGTA^TGTCAAATTTG T 

CAGGCC- - A G 

-AT-TA- -A G 

A-TTC A G 

CA-TCC- -A G 

T - 

-T-TTC-G- G 

T-G ACT AGC AAGAGTAAAT - 

T AGAAGCAAGAGTAGAT - 

-T-CC A G 

GTG ATTAGAAAGAGTAGAT - 



G 



93BR020 . 1 
92NG083 .2 
90CF056 . 1 
92RW009 . 6 
92NG003 . 1 
93BR029 .4 
94CY032 .3 
96ZM651 . 8 
96ZM751 
94CY017 



3 

41 



TAG C AATAGG C AT AG C AG C AT T AAT AGT AG C A C TAATAATAACAAT AGT T GT GTGGACT A 

A- AT ATCTG-C T-C GC-G-C-C-A G- - 

G. . . -T A G GC--G A--T-T--C G-CG--A C- 

A-ATCTAT-CA T C-G G- -G C GTG T 

A-AT-GCT-CA G-C--G GCC GC-G-C .... 

T TT C G 

A - AT CTGG - C A T- -G-C-GG -G T GT A T 

ATAG-G AG G G C-C G C- 

ATAG AG G C C G C TC- 

--AT-T-G-CA T- -G-C-G TT CT- -G A 



94IN476.104 ATAG-T AG G T- 



-CT- 



93BR020 , 
92NG083 . 
90CF056 , 
92RW009 . 
92NG003 . 
93BR029 . 
94CY032 . 
96ZM651 . 
96ZM751 . 
94CY017 . 
94IN476 . 

93BR020 . 
92NG083 . 
90CF056 . 
92RW009 . 
92NG003 . 
93BR029 . 
94CY032 . 
96ZM651 . 
96ZM751 . 
94CY017 . 
94IN476 . 

93BR020 . 
92NG083 . 
90CF056 . 
92RW009 . 
92NG003 . 
93BR029 . 
94CY032 . 
96ZM651 . 
96ZM751 . 
94CY017 . 
94IN476 . 



1 
2 
1 
6 
1 
4 
3 
8 
3 

41 
104 

1 
2 
1 
6 
1 
4 
3 
8 
3 

41 
104 

1 
2 
1 
6 
1 
4 
3 
8 
3 

41 
104 



TAGCATATATAGAATATAAGAAACTGGTAAGGCAAAGAAAAATAAATAGGTTATATAAAA 
T--T GA A-AAG--AA- -G-A G-A-A CT-G-T- 



-GG- 



-CC- 



-GA- 
-G-G 



-T- 
-T- 



-G-- 
-GA- 



-TC- 



--T--C A-- 

- -A- AAAG-AA- 

A 

--T--AG 

-G T A- 

■-T--TC A- 

■ -A-TAAG-A- - 
■-T--T A- 



AG- 

G- 

- GG AG - 



-G-C- 

-G 

-G-C- 



■-C--AT- 

AT- 

CT - 



G 

--G- 
G-T- 



-G-G- 



G 



GAATAAGCGAAAGAGCAGAAGACAGTGGCA ATGAGAGTGAGGGGGATGCAGAGGAATTGG 



ATG 



-G-A- 
A- 



-A- 
-A- 
-A- 



-T- 
-T- 



ENV start 



-G-C--C- 

-G-CT 

-G-CC 

-G-CT 

C 



-G--C-- 

- -AT 

- -AT 

--ATC-- 
--AT-G- 



-T--A- 
-T 



-CA AT 

-CATT- -T AT 

-CA 



-AT 



-T--G A A-T- 

-T--A G A AAT- 

-T--G T CA C-AT 

-G T A-T T 

VPU end , 

CAGCACTTGGGGAAGTGGGGCCTTTTATTCCTGGGGACATTAATAATCTGTAA[rGCTGCA 

- -A AT A GAC GA T TTGG T G CT- - 

-CAAG AT GA ACC A- -T T-TG- -GC-G G- 

--AA T- - GG AAC-A-GA--T TG C T G- 

•-A T CA T-GAC GA T TTGG - G T Q- 

-A T G- 



-CA T--G-A-- 

- -A-GA-G-T CA- - 

--A--A-G-T TA-- 

T GAG- 

--A Q-T---TA-- 



■ GAC - - 


-GA 


T-- 


-TTGG 


--ACC- 


--A--T- 




T-TG- 


-AAC-A 


-GA--T- 




--TG- 


- GAC - - 


-GA 


T-- 


-TTGG 

--T-- 


AAC-- 


-GA 


T-- 


-TTGG 


--A-C- 


--GG-T- 


TT- 


--TG- 


-A-C- 


--GG-T- 


TT- 


--TGC 


-A-C- 


-GA-TT- 




--TG- 


AA-C- 


--GG-T- 


TT- 


--TGC 



■G 

G--T-- 

■-T-G--G-- 

G 

G--T-A 



CT- - 

A- 

-CT-- 
TGTGG 
-AATG 

A A-- 

TA-T- 



5366 
5407 
5385 
5370 
5392 
5390 
5408 
5367 
5377 
5395 
5358 

5408 
5452 
5420 
5415 
5437 
5435 
5447 
5414 
5434 
5440 
5405 

5468 
5502 
5477 
5475 
5493 
5495 
5507 
5474 
5494 
5500 
5465 

5528 
5562 
5537 
5535 
5541 
5555 
5567 
5534 
5554 
5560 
5525 

5588 
5622 
5597 
5595 
5601 
5615 
5627 
5594 
5614 
5620 
5585 

5648 
5682 
5657 
5655 
5661 
5675 
5687 
5654 
5674 
5680 
5645 
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93BR020.1 GAAAACTTATGGGTTACAGTCTATTATGGGGTAC CTGTGTGGAAAGAAGC AAC CACTACT 570 8 

92NG083.2 --T G C G T GAT--CC-C 5742 

90CF056.1 C G A G AA--C 5717 

92RW00 9-6 A-C G T C C GAG- - C- - C 5715 

92NG003 .1 A-T G C A G C GAT - -CC- - 5721 

93BR029.4 A-C...--G C 5732 

94CY032 .3 A-C G C T G C GAG--C--C 5747 

96ZM651.8 -GG G C AA 5714 

96ZM751.3 -GG--A--G C C AA 5734 

94CY017.41 --T...--G C A--C A T GAT - - C - TC 5737 

94IN476.104 -GG G C AA 5705 

9 3 BRO 20-1 CTATTCTGTGCATCAGATGCTAAATCATATGAAAAAGAGGCACATAATGTCTGGGCTACA 5768 

92NG0 83 . 2 C--T AGTTCT- - AAA 5802 

90CF056 . 1 GG G-C AAAG 5777 

92RW009.6 T T G TCC AAAG 5775 

92NG003.1 T T G AGT-CT--AAG C 5781 

93BR029.4 T G A 5792 

94CY032.3 T A G G A-T A 5807 

96ZM651.8 G A-TG 5774 

96ZM751.3 T T G G-C A-TG T 5794 

94CY017.41 T G T-C A-TG A C 5797 

94IN476.104 T G-T G--G TG A 5765 

9 3 BRO 20.1 C ATGC TTGT GTA C C C AC AGAT C C C AAT C C AC AAGAAGT AGT T C T G GAAAAT GT AA C AGAA 5828 

92NG083.2 C C--T--C A C-A-A 5862 

90CF056.1 A C C GA-G--CA G G 5837 

92RW009 .6 C T C G-C A--CA-T G 5 83 5 

92NG003.1 C C C GA- -AC 5841 

93BR029-4 C 5852 

94CY032.3 C C C C AT G 5867 

96ZM651.8 C C C A T G 5834 

96ZM751.3 C C C A-G T 5854 

94CY017.41 C C C A- - AAC 5857 

94IN4 76 . 104 C C C GA-G- A-T-A-T 5825 

93BR020.1 AGGT T T AAT AT GT GGG AAAAT AAC ATGGT AG AAC AAATGC ATAC AGATATAAT C AGTT TA 5 88 8 

92NG083.2 -AT C A-G G GGAG 5922 

90CF056.1 --C G--G--G C 5897 

92RW009.6 GA C A G--G C-- 5895 

92NG003.1 -CT C A G GAG 5901 

93BR029.4 -AT G A 5912 

94CY032.3 -AC C A G G GAG 5927 

96ZM651 . 8 -AT C A G G--T--G GAG 5894 

96ZM751.3 -A C G--T--G GAG 5914 

94CY017.41 -AT A G--G AGA C 5917 

94IN476.104 -AT C A G G- -T- -G GAG G 5885 

93 BRO 2 0 . 1 TGGGATCAAAGCCTAAAGCCATGTGTGAAGTTAACCCCACTCTGTGTTACTTTAGATTGT 594 8 

92NG083.2 GG A C T--T A-C A-C 5982 

90CF056 .1 T-G--A A--A C--A-C 5957 

92RW0Q9.6 C A T C G 5955 

92NG003.1 G A C T C A-C 5961 

93BR029.4 G CG 5972 

94CY032.3 A--G--G A CA C T~T TACA 5987 

96ZM651.8 A G C A 5954 ■ 

96ZM751.3 C A G C A-C 5974 

94CY017.41 A--A G C--C-T A 5977 

94IN476.104 A G C A-C 5945 

9 3 BRO 2 0 - 1 AGAAACATTGCCACCAATGGCACCAATGACACTATT GC CAT CAATGAC 5996 

92NG083 . 2 -CT- -TG-AAA- -GTGC-AATCAT-C G GC A- - 6021 

90CF056 .1 -CT - -TG-CAGA-A AC-T-T--CAG --C-AG- 5993 

92RW009.6 -AC CA ATGTCAA- -A- -C- -T CATT 5991 

92NG003 . 1 - CT - - TG - CAATTGT - - CA- T - ATGTGAC - - GC - C - GGGAACAGTGCT - GG - C C-CT 6021 

93BR029-4 --T--TGCCA-T CA-T- -TC-AA- - . -G-CAC- 6008 

94CY032 .3 -TT- -TGCAA-T- -T-C-AAT-GT-CCA -T 6017 

96ZM651 . 8 -C-G-GG- -AATGTT-CCA-A-ATGT-A-T-A- -GCGTGGTTAATAATA CA TT 6014 

96ZM751 .3 - CTGCT - A - ATA CAATG-T ATA - -C-AC AAT- -T-AT-TAAC- 6025 

94CY017.41 - - C - - TGCCAAT GCAC-CAT-GCA-T-G --GT-GCAC- 6019 

94IN476.104 - -T- -GG- -A- - . . . AATG-T-C- 5969 
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93BR020 . 1 TAAACCAGTGGTATCCACTCAATTGTTGTTAAATGGCAGCCTAGCAGAAG. . . GAGAGAT 6373 

92NG083 .2 G A C-AC--C-G TT . . .A T-- 63 95 

90CF056 .1 --G A C--C-A A . . .A-C 6364 

92RW0 09.6 C--G A A GC--C T . ..A 6377 

92NG003 .1 A AC-AC T--T . . . A- - 6407 

93BR029.4 A A. . .A T- - 6400 

94CY032 .3 G A C A T ACG- . . .A G- 6421 

96ZM651.8 G A C-AC T . . .A--G 6409 

96ZM751.3 G A A AC T--T ...A 6396 

94CY017 .41 A-C A C--C G T GAG-GA-A-- 6417 

94IN476.104 G A C-AC T AC A. . .A 6355 

93BR02 0 . 1 AGT AAT C AG AT C T C AAAAT AT CTC AGAT AATGC AAAAAC C AT AAT AGT G C AC CTT AATGA 6433 

92NG083.2 -AG T G T--A C A-C GT G A- 6455 

90CF056 . 1 CA T A- -A C A-C A A--G GAC 6424 

92RW009.6 -A T G TA- -A-C C A- -A C-- 643 7 

92NG003 . 1 T G CC--A C C GT A--G A- 64 67 

93BR029-4 -A A--A T 6460 

94CY032 .3 T A A C A-C AT A--G GCAA- 6481 

96ZM6 51 . 8 -A T G C-GA- - A-C TC A A- -T AG 6469 

96ZM751 . 3 -A T A GA C C TA A- -T 6456 

94CY017 . 41 -A-G--T G TA--A-C C A T- - GT C-A- 64 77 

94IN476.104 -A CT G AA C T A--T 6415 

9 3BR0 2 0.1 ATCTGTAC AGATT AATTGTACAAGAC CCAACAACAATACAAGAAAAAGAATAT CTTTAGG 6493 

92NG083 . 2 TAG-A - -G-A TC T T C-AA-C-- 6515 

90CF056 . 1 -C-A A-C CA C--G--T T G C T CA 64 84 

92RW009 . 6 GA A T-C T TG--CA-A 6497 

92NG003 .1 -A--A--GGA C T AGAA-C- - 6527 

93BR029.4 G-C T C-AA 6520 

94CY032 .3 GG A-A C TGG TG--CA-A 6541 

96ZM651 . 8 A- -G-A GTG GT T C T AGAA 6529 

96ZM751 .3 G-A GTG T G TG-GAGGA 6516 

94CY017 .41 GC TA C TC T CGC--T-- 653 7 

94IN476.104 CA- -A-A GTA T--C T AGGA 6475 

93BR020 . 1 AC CAGGACGAGTATTTT AT AC AAC AGGAGAAATAAT AGGAGAC AT CAGAAAGGC ACATTG 6553 

9 2NG083 .2 A--CG--C G T--T A C-A 6575 

90CF0 56 . 1 G C C G T--C--C T--A C-A 6 544 

92RW0 09. 6 A- -C G T- -CG G--T--A C-A T 6 5 57 

92NG0 03 .1 T A--CG--C G T C-A 65 70 

93BR029.4 C 6580 

94CY032 .3 G T-AC--GG G T T--A C-A 6601 

96ZM651.8 A-AC C G C A C-A 6589 

96ZM751 .3 A-AC C G A-T--A C-A T 6576 

94CY017 -41 A--CC--C AT- A- . . . G A C-A 6594 

94IN476.104 G A--C C G AAC-GC A C-A 6535 

93BR020 . 1 TAACGTTAGTGGAACACAATGGAGGAACACGTTAGCAAAGGTAAAGGCAAAGTTAGGGTC 6613 

92NG0 83 .2 T A T-A G-G-T AAG--T--C-CA C--C--A--AA 6635 

90CF056 .1 TA A G-C AT- -G- -T CACC GTTA- - C-A AAT 6604 

92RW009 . 6 - -CT- -C-A A AT-GA- -T CA GCA-A A A-TCA 6617 

92NG003 .1 G CA-G-G-T CAG C C-AC AACA 6614 

93BR029.4 A ATG-G A G AA-C- 6640 

94CY032 .3 A ATG-T ATG C AA-GT-A GT-A-G-A- -GAAAAG 6661 

96ZM651 . 8 A A-G--TA-C CT--G--T CG-G G-AAC- -A A-AGA 6649 

96ZM751 -3 A-C A-GGCA AT TC- -CA- -G GGT-A A A-AAA 6636 

94CY017 .41 TA-C-ACAA TT ATG T CA GCT-A-C-A A-AGA 6654 

94IN476 .104 A A-T-TA-C CT--A--T CA-G GGAAA A CAAA 6595 

93BR020 . 1 TTATTTCCCTAAT . . . GC AAC AAT AAAAT T T AAC T C AT C CTC AG GAGGGGAC C TAGAAAT 6670 

92NG083.2 AATC-ATAA . . . AAG-AC CC TG 6692 

90CF056.1 AC-C--GAAC . . .AG GC GC - - AA A-G G- 6661 

92RW009.6 C - - C - - TGAG- - CATTA TT GAAC G TT 6677 

92NG003 .1 GGTC--TAAC-- - - GT CC G 6668 

93BR029.4 -C ... 6697 

94CY032 .3 ACTC . . .AA GCTC--C--GT 6718 

96ZM651.8 AC-C . . . AA- - AC C AC 6706 

96ZM751.3 A--C . . .AA GC GCAC 6693 

94CY017.41 GA-A G. . .AA C TC CTAAC C 6711 

94IN476.104 GC-C . . .AA GT- -C CA 6652 
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93BR020 . 1 TAAAGTAGTAGAAATTGAACCACTAGGAGTAGCACCCACCAAGGCAAAAAGACAAGTGGT 7045 

92NG083 .2 AC A CA--T G GG AG 7067 

90CF056.1 A G A A GG AG 7060 

92RW009.6 A G G- -GAG 7073 

92NG003.1 A A A A GG AG 7034 

93BR029-4 G T 7060 

94CY032.3 A-G A AT GG G 7111 

96ZM651 .8 G A-G T-G A TG GAG 7096 

96ZM751.3 G A-G T T CG GAG 7086 

94CY017.41 A--C T G AG 7113 

94IN476 - 104 G A-G T A T--TG-A GAG 7054 

93BR020 . 1 GAAGAGAGAAAGAAGAGCAGTGGGACTAGGAGCTCTGTTCCTTGGGTTCTTGGGAGCAGC 7105 

92NG083.2 -G A T G G-C A 7127 

90CF056.1 -G A A-G TCT 7120 

92RW009 . 6 -G A T G G-C A A 7133 

92NG003 . 1 -G G- -A G T G G-C A 7094 

93BR029-4 A A-G T 7120 

94CY032.3 -C A-T A G--CA 7171 

96ZM651.8 -G A A G 7156 

96ZM751.3 -G A A A G 7146 

94CY017.41 -G A T G G-C 7173 

94IN476.104 -G A A G T 7114 

93BR020.1 TGGAAGCACTATGGGCGCGGCGTCAATAACGCTGACGGTACAGGCCAGACAATTATTGTC 7165 

92NG083.2 A--G C T 7187 

90CF056.1 A G 7180 

92RW009.6 A 7193 

92NG003.1 A G T 7154 

93BR029-4 A A C 7180 

94CY032.3 A A G 7231 

96ZM651.8 A A C G-G 7216 

96ZM751.3 A A--A T 7206 

94CY017.41 A C 7233 

94IN476.104 A G G 7174 
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COMBINED DECLARATION AND POWER OF ATTORNEY FOR 
ORIGINAL, DESIGN, NATIONAL STAGE OF PCT, SUPPLEMENTAL 
DIVISIONAL. CONTINUATION OR CONTINUATION-IN-PART APPLICATION 

As a below name inventor, I hereby declare that: 

My residence, post office address and citizenship are as stated below next to my name, 

We believe we are the original, first and joint inventor of the subject matter which is claimed and for which a patent 
is sought on the invention entitled: 

REFERENCE CLONES AND SEQUENCES FOR NON-SUBTYPE B ISOLATES OF HUMAN 

IMMUNODEFICIENCY VIRUS TYPE 1 

the specification of which 

a. [X] is attached hereto 

b. [ ] was filed on as application Serial No. and was amended on 

. (if applicable). 

PCT FILED APPLICATION ENTERING NATIONAL STATE 

c. [ ] was described and claimed in International Application No. filed on and 

as amended on . (if any). 

We hereby state that we have reviewed and understand the contents of the above-identified specification, including 
the claims, as amended by any amendment referred to above. 

We acknowledge the duty to disclose information which is material to the examination of this application in 
accordance with Title 37, Code of Federal Regulations, § 1.56(a). 

We hereby specify the following as the correspondence address to which all communications about this application 
are to be directed: 

SEND CORRESPONDENCE TO: MORGAN & FINNEGAN, L.L.P 

345 Park Avenue 
New York, N.Y. 10154 

DIRECT TELEPHONE CALLS TO: Eugene Moroz 
(212) 758-4800 

[ ] I hereby claim foreign priority benefits under Title 35, United States Code § 1 19(a)-(d) or under 
§ 365(b) of any foreign application(s) for patent or inventor's certificate or under § 365(a) of any PCT international 
application(s) designating at least one country other than the U.S. listed below and also have identified below such 
foreign application(s) for patent or inventor's certificate or such PCT international application(s) filed by me on the 
same subject matter having a filing date within twelve (12) months before that of the application on which priority is 
claimed: 

[ ] The attached 35 U.S.C. § 119 claim for priority for the application(s) listed below forms a part of this 
declaration. 
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Application Date of filing Date of Issue Priority 

Countrv/PCT Number (dav. month, yr) (day, month, yr) Claimed 



r } yes r i no 

[ ] YES [ ] NO 

\ 1 YES [ 1 NO 

[ ] I hereby claim the benefit under 35 U.S.C. § 1 19(e) of any U.S. provisional application(s) listed below. 
Provisional Application No. Date of Filing (day, month, yr) 



ADDITIONAL STATEMENTS FOR DIVISIONAL, CONTINUATION OR CONTINUATION-IN-PART 
OR PCT INTERNATIONAL APPLICATIONS (DESIGNATING THE U.S.) . 



I hereby claim the benefit under Title 35, United States Code § 120 of any United States application(s) or under 
§ 365(c) of any PCT international application(s) designating the U.S. listed below. 



US/PCT Application Serial No. Filing Date Status (patented, pending, abandoned)/ 

U.S. application no. assigned (For PCT) 



US/PCT Application Serial No. Filing Date Status (patented, pending, abandoned)/ 

U.S. application no. assigned (For PCT) 



[ ] In this continuation-in-part application, insofar as the subject matter of any of the claims of this 
application is not disclosed in the above listed prior United States or PCT international application(s) in the manner 
provided by the first paragraph of Title 35, United States Code, § 112, 1 acknowledge the duty to disclose material 
information as defined in Title 37, Code of Federal Regulations, § 1.56(a) which occurred between the filing date of 
the prior application(s) and the national or PCT international filing date of this application. 

I hereby declare that all statements made herein of my own knowledge are true and that all statements made on 
information and belief are believed to be true; and further that these statements were made with the knowledge that 
willful false statements and the like so made are punishable by fine or Imprisonment, or both, under Section 1001 of 
Title 18 of the United States Code and that such willful false statements may jeopardize the validity of the 
application or any patent issued thereon. 

I hereby appoint the following attorneys and/or agents with full power of substitution and revocation, to prosecute 
this application, to receive the patent, and to transact all business in the Patent and Trademark Office connected 
therewith: John A. Diaz (Reg. No. 19,550), John C. Vassil (Reg. No. 19,098), Alfred P. Ewert (Reg. No. 19,887), 
David H. PfefTer, P.C (Reg. No. 19,825), Harry C. Marcus (Reg. No. 22,390), Robert E. Paulson (Reg. No. 21,046), 
Stephen R. Smith (Reg. No. 22,615), Kurt E. Richter (Reg. No. 24,052), J. Robert Dailey (Reg. No. 27,434), Eugene 
Moroz (Reg. No. 25,237), JohnF, Sweeney (Reg. No. 27,471), Arnold L Rady (Reg. No. 26,601), Christopher A. 
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Hughes (Reg. No, 26,914), William S. Feiler (Reg. No. 26,728), Joseph A. Calvaruso (Reg. No. 28,287), James W, 
Gould (Reg. No. 28,859), Richard C. Komson (Reg. No. 27,913), Israel Blum (Reg. No. 26,710), Bartholomew 
Verdirame (Reg. No. 28,483), Maria C.H. Lin (reg. No. 29,323), Joseph A. DeGirolamo (Reg. No. 28,595), Michael 
A. Nicodema (Reg. No. 33,199), Michael P. Dougherty (Reg, No. 32,730), Seth J. Atlas (Reg. No. 32,454), Andrew 
M. Riddles (Reg. No. 31,657), Bruce D. DeRenzi (Reg. No. 33,676), Michael M Murray (Reg. No. 32,537) and 
Mark J. Abate (Reg. No. 32,527) of Morgan & Finnegan, L.L.P. whose address is: 345 Park Avenue, New York, 
New York, 10154; and Edward A. Pennington (Reg. No. 32,588) of Morgan & Finnegan, L.L.P., whose address is 
1775 Eye Street, Suite 400, Washington, D.C. 20006. 

[ ] I hereby authorize the U.S. attorneys and/or agents named hereinabove to accept and follow instructions 

from 

_ as to any action to be taken in the U.S. Patent and Trademark Office 



regarding this application without direct communication between the U.S. attorneys and/or agents and me. 
In the event of a change in the person(s) from whom instructions may be taken I will so notify the U.S. 
attorneys and/or agents hereinabove. 



Full name of sole or first inventor Beatrice H. Hahn £ 

Inventor's signature* <^^V^ * ^/-w &j ^ %y g f 

f date 
Residence 3571 Rockhill Road, Birmingham, AL 35223 

Citizenship Germany 

Post Office Address SAME AS ABOVE 



si 

Full name of second joint inventor/ / George M. Shaw 




Inventor's signature* 

L7 date 
Residence 3571 Rockhill Road, Birmingham, AL 35223 



idf 2 jr /fa 



Citizenship U.S.A. 



Post Office Address SAME AS ABOVE 



Full name of third joint inventor Feng Gao 



Inventor's signature* \ ^ 



^ ~ date 
Residence 2308 Mountain Oaks Lane, Hoover, AL 35226 



Citizenship China 



Post Office Address SAME AS ABOVE 
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* Before signing this declaration, each person signing must: 

1 . Review the declaration and verity the correctness of all information therein; and 

2, Review the specification and the claims, including any amendments made to the claims. 

After the declaration is signed, the specification and claims are not to be altered. 
To the inventor(s): 

The following are cited in or pertinent to the declaration attached to the accompanying application: 

Title 37. Code of Federal Regulation, § 1.56 

Duty to disclose information material to patentability. 

(a) A patent by its very nature is affect with a public interest. The public interest is best served, and 
the most effective patent examination occurs when, at the time an application is being examined, the Office 
is aware of and evaluates the teachings of all information material to patentability. Each individual 
associated with the filing and prosecution of a patent application has a duty of candor and good faith in 
dealing with the Office, which includes a duty to disclose to the Office all information known to that 
individual to be material to patentability as defined in this section. The duty to disclose information exists 
with respect to each pending claim until the claim is canceled or withdrawn from consideration, or the 
application becomes abandoned. Information material to the patentability of a claim that is canceled or 
withdrawn from consideration need not be submitted if the information is not material to the patentability 
of any claim remaining under consideration in the application. There is no duty to submit information 
which is not material to the patentability of any existing claim. The duty to disclose all information known 
to be material to patentability is deemed to be satisfied if all information known to be material to 
patentability of any claim issued in patent was cited by the Office or submitted to the Office in the manner 
prescribed by §§1.97(b)-(d) and 1.98. However, no patent will be granted on an application in connection 
with which fraud on the Office was practiced or attempted or the duty of disclosure was violated through 
bad faith or intentional misconduct. The Office encourages applicants to carefully examine: 

(1) prior art cited in search reports of a foreign patent office in a counterpart application, and 

(2) the closest information over which individuals associated with the filing or prosecution of 
a patent application believe any pending claim patentably defines, to make sure that any 
material information contained therein is disclosed to the Office. 

Title 35. U.S. Code § 101 
Inventions patentable 

Whoever invents or discovers any new and useful process, machine, manufacture, or composition of 
matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and 
requirements of this title. 

Title 35 U.S. Code § 102 

Conditions for patentability; novelty and loss of right to patent 

-4- 
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A person shall be entitled to a patent unless - 



(a) the invention was known or used by others in this country, or patented or described in a printed 
publication in this or a foreign country, before the invention thereof by the applicant for patent, 

(b) the invention was patented or described in a printed publication in this or foreign country or in 
public use or on sale in this country, more than one year prior to the date of application for patent in the United 
States, or 

(c) he has abandoned the invention, or 

(d) the invention was first patented or caused to be patented, or was the subject of an inventor's 
certificate, by the applicant or his legal representatives or assigns in a foreign country prior to the date of the 
application for patent in this country on an application for patent or inventor's certificate field more than twelve 
months before the filing of the application in the United States, or 

(e) the invention was described in a patent granted on an application for patent by another filed in the 
United States before the invention thereof by the applicant for patent, or on an international application by another 
who has fulfilled the requirements of paragraphs (1), (2), and (4) of section 371(c) of this title before the invention 
thereof by the applicant for patent, or 

(f) he did not himself invent the subject matter sought to be patented, or 

(g) before the applicant's invention thereof the invention was made in this country by another had not 
abandoned, suppressed, or concealed it. In determining priority of invention there shall be considered not only the 
respective dates of conception and reduction to practice of the invention, but also the reasonable diligence of one 
who was first to conceive and last to reduce to practice, from a time prior to conception by the other . . . 

Title 35, U.S. Code § 103 

Conditions for patentability; non-obvious subject matter 

A patent may not be obtained though the invention is not identically disclosed or described as set forth in 
section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are such 
that the subject matter as a whole would have been obvious at the time the invention was made to a person having 
ordinary skill in the art to which said matter pertains. Patentability shall not be negatived by the manner in which 
the invention was made. 

Subject matter developed by another person, which qualifies as prior art only under subsection (f) or (g) of 
section 102 of this title, shall not preclude patentability under this section where the subject matter and the claimed 
invention were, at the time the invention was made, owned by the same person or subject to an obligation of 
assignment to the same person. 

Title 35. U.S. Code § 1 12 (in part) 
Specification 

The specification shall contain a written description of the invention, and of the manner and process of 
making and using it, in such full, clear, concise and exact terms also enable any person skilled in the art to which it 
pertains, or with which it is mostly nearly connected, to make and use the same, and shall set forth the best mode 
contemplated by the inventor of carrying out his invention. 
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Title 35, U.S. Code § 119 



Benefit of earlier filing date in foreign country; right of priority 

An application for patent for an invention filed in this country by any person who lias, or whose legal 
representatives or assigns have, previously regularly filed an application for a patent for the same invention in a 
foreign country which affords similar privileges in the case of applications filed in the United States or to citizens of 
the United States, shall have the same effect as the same application would have if filed in this country on the date 
on which the application for patent for the same invention was first filed in such foreign country, if the application in 
this country is filed within twelve months from the earliest date on which such foreign application was filed; but no 
patent shall be granted on any application for patent for an invention which had been patented or described in a 
printed publication in any country more than one year before the date of he actual filing of the application in this 
country, or which had been in public use or on sale in this country more than one year prior to such filing. 



Title 35, U.S. Code § 120 

Benefit or earlier filing date in the United States 

An application for patent for an invention disclosed in the manner provided by the first paragraph of section 
112 of this title in an application previously filed in the United States, or as provided by section 363 of this title, 
which is filed by an inventor or inventors named in the previously filed application shall have the same effect, as to 
such invention, as though filed on the date of the prior application, if filed before the patenting or abandonment of or 
termination of proceedings on the first application or an application similarly entitled to the benefit of the filing date 
of the first application and if it contains or is amended to contain a specific reference to the earlier filed application. 

Please read carefully before signing the Declaration attached to the accompanying Application. 

If you have any questions, please contact Morgan & Finnegan, L.L.R 



FORM:COMB-DEC.NY 
Rev. 5/21/98 
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